jSymbolic is a software package for extracting high-level musical features from symbolic music representations, specifically MIDI files, as well as for iteratively developing and sharing new features. jSymbolic shares many of the design characteristics of jAudio that emphasize feature extensibility. This includes a modular design for adding new features, automatic provision of all other feature values to each feature and dynamic feature extraction scheduling that automatically resolves feature dependencies.
Low-level signal-processing features have to date dominated most music information retrieval (MIR) research. jSymbolic emphasizes more musically meaningful high-level features, which are of paramount interest to researchers in empirical musicology and music theory. Such researchers currently use relatively primitive computer processing when they use computers at all, and almost never use sophisticated machine learning tools. jSymbolic, when combined with ACE, and potentially other jMIR components, could therefore be of significant use to music researchers in the humanities as well as in MIR.
jSymbolic is packaged with a library of 111 implemented high-level features which were developed through extensive analysis of publications in the fields of music theory, musicology and MIR. There are another 49 features that have been designed and are currently being implemented. All 160 of these features and the sources surveyed during their development are documented in Cory McKay's master's thesis and was originally developed as part of the Bodhidharma project. Most of these features had not previously been applied to MIR research, and many of them are entirely novel. The features can be loosely divided into the following seven categories:
- Instrumentation: What types of instruments are present and which are given particular importance relative to others? The importance of both pitched and non-pitched instruments and their interaction with each other is considered.
- Texture: How many independent voices are there and how do they interact (e.g., polyphonic, homophonic, etc.)? What is the relative importance of different voices?
- Rhythm: The time intervals between the attacks of different notes and the durations of each note are considered. What kinds of meters and rhythmic patterns are present? Is rubato used? How does rhythm vary from voice to voice?
- Dynamics: How loud are notes and what kinds of variations in dynamics occur?
- Pitch Statistics: What are the occurrence rates of different notes, in terms of both pitches and pitch classes? How tonal is the piece? What is its range? How much variety in pitch is there?
- Melody: What kinds of melodic intervals are present? How much melodic variation is there? What kinds of melodic contours are used? What types of phrases are used and how often are they repeated?
- Chords: What vertical intervals are present? What types of chords do they represent? How much harmonic movement is there, and how fast is it?
A special emphasis was placed on ensuring a diversity of features, which is part of the reason why so many features were implemented. General-purpose suites such as jMIR must be able to deal with arbitrary types of music, and the features that might be relevant to certain types of music might be useless with respect to others. jSymbolic thus presents users with a very large palette of features from which to choose based on either their own musical expertise or ACE’s automatic feature selection functionality.
Both one and multi-dimensional features are present in jSymbolic, including a number of features based on or derived from histograms. These include beat histograms and a variety of histograms based on pitch and pitch class, as well as instrumentation histograms, melodic interval histograms, vertical interval histograms, and chord type histograms. Feature values can be saved to either ACE XML or Weka ARFF files.
McKay, C. 2010. Automatic music classification with jMIR. Ph.D. Thesis. McGill University, Canada.
McKay, C., J. A. Burgoyne, J. Hockman, J. B. L. Smith, G. Vigliensoni, and I. Fujinaga. 2010. Evaluating the genre classification performance of lyrical features relative to audio, symbolic and cultural features. Proceedings of the International Society for Music Information Retrieval Conference. 213–8.
McKay, C., and I. Fujinaga. 2010. Improving automatic music classification performance by extracting features from different types of data. Proceedings of the ACM SIGMM International Conference on Multimedia Information Retrieval. 257–66.
McKay, C., and I. Fujinaga. 2008. Combining features extracted from audio, symbolic and cultural sources. Proceedings of the International Conference on Music Information Retrieval. 597–602.
McKay, C., and I. Fujinaga. 2007. Style-independent computer-assisted exploratory analysis of large music collections. Journal of Interdisciplinary Music Studies 1 (1): 63–85.
McKay, C., and I. Fujinaga. 2006. jSymbolic: A feature extractor for MIDI files. Proceedings of the International Computer Music Conference. 302–5.
Questions and Comments
DOWNLOAD FROM SOURCEFORGE
-top of page-