jSymbolic is a software package for extracting high-level musical features from symbolic music representations, specifically MIDI files, as well as for iteratively developing and sharing new features. jSymbolic shares many of the design characteristics of jAudio that emphasize feature extensibility. This includes a modular design for adding new features, automatic provision of all other feature values to each feature and dynamic feature extraction scheduling that automatically resolves feature dependencies.

Low-level signal-processing features have to date dominated most music information retrieval (MIR) research. jSymbolic emphasizes more musically meaningful high-level features, which are of paramount interest to researchers in empirical musicology and music theory. Such researchers currently use relatively primitive computer processing when they use computers at all, and almost never use sophisticated machine learning tools. jSymbolic, when combined with ACE, and potentially other jMIR components, could therefore be of significant use to music researchers in the humanities as well as in MIR.

jSymbolic is packaged with a library of 111 implemented high-level features which were developed through extensive analysis of publications in the fields of music theory, musicology and MIR. There are another 49 features that have been designed and are currently being implemented. All 160 of these features and the sources surveyed during their development are documented in Cory McKay's master's thesis and was originally developed as part of the Bodhidharma project. Most of these features had not previously been applied to MIR research, and many of them are entirely novel. The features can be loosely divided into the following seven categories:

A special emphasis was placed on ensuring a diversity of features, which is part of the reason why so many features were implemented. General-purpose suites such as jMIR must be able to deal with arbitrary types of music, and the features that might be relevant to certain types of music might be useless with respect to others. jSymbolic thus presents users with a very large palette of features from which to choose based on either their own musical expertise or ACE’s automatic feature selection functionality.

Both one and multi-dimensional features are present in jSymbolic, including a number of features based on or derived from histograms. These include beat histograms and a variety of histograms based on pitch and pitch class, as well as instrumentation histograms, melodic interval histograms, vertical interval histograms, and chord type histograms. Feature values can be saved to either ACE XML or Weka ARFF files.

Screen Shot

Related Publications

McKay, C. 2010. Automatic music classification with jMIR. Ph.D. Thesis. McGill University, Canada.

McKay, C., J. A. Burgoyne, J. Hockman, J. B. L. Smith, G. Vigliensoni, and I. Fujinaga. 2010. Evaluating the genre classification performance of lyrical features relative to audio, symbolic and cultural features. Proceedings of the International Society for Music Information Retrieval Conference. 213–8.

McKay, C., and I. Fujinaga. 2010. Improving automatic music classification performance by extracting features from different types of data. Proceedings of the ACM SIGMM International Conference on Multimedia Information Retrieval. 257–66.

McKay, C., and I. Fujinaga. 2008. Combining features extracted from audio, symbolic and cultural sources. Proceedings of the International Conference on Music Information Retrieval. 597–602.

McKay, C., and I. Fujinaga. 2007. Style-independent computer-assisted exploratory analysis of large music collections. Journal of Interdisciplinary Music Studies 1 (1): 63–85.

McKay, C., and I. Fujinaga. 2006. jSymbolic: A feature extractor for MIDI files. Proceedings of the International Computer Music Conference. 302–5.

Questions and Comments



-top of page-