jAudio 2

Updated Version

An updated version of jAudio has been published by Daniel McEnnis at https://github.com/dmcennis/jaudioGIT. It has been developed separately from this earlier version still packaged with jMIR on SourceForge, and contains a number of useful improvements.

Overview

jAudio is a software package for extracting features from audio files as well as for iteratively developing and sharing new features. These extracted features can then be used in many areas of music information retrieval (MIR) research, often via processing with machine learning framework such as ACE.

This iterative approach to feature development emphasized in the design of all jMIR components is particularly important with respect to audio feature extraction, where low-level features can be combined to build increasingly high-level and musically meaningful features. There are a number of aspects of jAudio that facilitate such iterative feature development. For example, jAudio uses a modular plugin interface that avoids core code modification or recompilation when new features are added. One need only place a newly compiled feature in a plugin folder and add a reference to it in an XML configuration file, which can refer to remote URLs as well as local file paths.

As is the case with all three jMIR feature extractors, jAudio also dynamically provides all features with the values of all other extracted features, and dynamically calculates all feature dependencies automatically so that feature extraction can be scheduled appropriately. jAudio also provides audio samples to features as simple arrays, so that researchers do not need to deal directly with Java’s somewhat arcane audio interface or with low-level issues such as buffering or audio format conversions.

Automated “metafeature” extraction is another advantage of jAudio. Metafeatures are template-derived features that can be extracted from one or more other features. Examples of metafeatures implemented in jAudio include Running Mean, Running Standard Deviation and Derivative. To illustrate how metafeatures work, consider a researcher who has implemented an imagined feature named “tonal energy” and has added it as a plugin to jAudio. Users would then automatically have the option at runtime of whether or not to extract each metafeature for this new feature, without the implementer of tonal energy needing to implement any code for calculating quantities such as how a feature is changing from window to window (Derivative). Metafeatures can additionally be chained together (e.g., derivative of running mean), and developers are free to implement additional metafeatures which can then automatically be applied to existing features without modifying them.

“Aggregators” are an additional type of functionality offered by jAudio. Aggregators are functions that collapse a sequence of separate vectors into a single vector or a smaller sequence of vectors. jAudio considers two basic types of aggregators. The first, simpler type consists of functions that can be applied to the windowed values of any single feature. Examples include the Standard Deviation or Mean aggregators, which can projects all values of any feature over all windows into a single mean value and a single standard deviation value. Such aggregators can be very helpful when dealing with potentially huge amounts of feature data and attempting to come to terms with the “curse of dimensionality.”

The second type of aggregator can be applied to multiple different features. For example, the Area of Moments Aggregator takes in any set of different input features, treats their combined sequence of vectors as a two-dimensional image matrix, and calculates two-dimensional moments for this matrix. Such aggregators are useful in representing in a low-dimensional way how different features change together, something that can be highly musically significant, but is too often ignored in MIR systems. jAudio also implements another aggregator of this type named Multiple Feature Histogram. Users of jAudio are free to implement custom aggregators of their own, and to target existing aggregators as they wish.

jAudio’s current distribution includes 28 implemented basic features, and metafeatures and aggregators can of course be used to greatly expand this number. Some of these features are standard features with proven efficacy, and others are more innovative and are presented to the research community for experimentation.

jAudio includes a GUI for general-purpose use, an API for those interested in embedding jAudio in their own applications and a command line interface to facilitate scripting. Users may select which features, metafeatures and aggregators to extract, and can also set general parameters such as window size, window overlap, downsampling and amplitude normalization. Some individual features also allow specific additional parameters to be set. Configuration files can be saved so that settings can be reused.

jAudio can perform several basic types of audio synthesis, record audio and transfer MIDI files to audio in order to facilitate the testing of new features. For similar reasons, the software can also display audio signals in both the frequency and time domains. jAudio can parse MP3, wav, aiff, aifc, au and snd files. Feature values can be saved to either ACE XML or Weka ARFF files, and users have the option of saving the features extracted for each individual window or only values aggregated over all windows.

NOTE: This page has been translated to the Serbo-Croatian language language by Vera Djuraskovic from webhostinggeeks.com. It has also been translated to Armenian by Gajk Melikyan at this page.

Screen Shot (modified artificially to show two menus simultaneously)

Related Publications and Presentations

McKay, C. 2010. Automatic music classification with jMIR. Ph.D. Thesis. McGill University, Canada.

McKay, C., J. A. Burgoyne, J. Hockman, J. B. L. Smith, G. Vigliensoni, and I. Fujinaga. 2010. Evaluating the genre classification performance of lyrical features relative to audio, symbolic and cultural features. Proceedings of the International Society for Music Information Retrieval Conference. 213–8.

McKay, C., and I. Fujinaga. 2010. Improving automatic music classification performance by extracting features from different types of data. Proceedings of the ACM SIGMM International Conference on Multimedia Information Retrieval. 257–66.

McKay, C., and I. Fujinaga. 2008. Combining features extracted from audio, symbolic and cultural sources. Proceedings of the International Conference on Music Information Retrieval. 597–602.

McEnnis, D., C. McKay, and I. Fujinaga. 2006. jAudio: Additions and improvements. Proceedings of the International Conference on Music Information Retrieval. 385–6.

McEnnis, D., C. McKay, and I. Fujinaga. 2006. Overview of OMEN. Proceedings of the International Conference on Music Information Retrieval. 7–12.

McEnnis, D., C. McKay, I. Fujinaga, and P. Depalle. 2005. jAudio: A feature extraction library. Proceedings of the International Conference on Music Information Retrieval. 600–3.

Questions and Comments

Daniel McEnnis: dmcennis@gmail.com
Cory McKay: cory.mckay@mail.mcgill.ca

DOWNLOAD FROM SOURCEFORGE

NOTE: jAudio also has its own separate SourceForge project site that includes source code in the CVS section.

-top of page-