jSongMiner is a software package for auto-identifying songs and extracting
metadata about them from various sources on the web and elsewhere. This software
was originally designed for use in the context of digital libraries, but it
can certainly be adopted for other purposes as well. For example, it could
be used as a means of obtaining cultural features for automatic music classification,
or even for annotating personal music collections. For those users who do
wish to use jSongMiner in the context of a digital library, it has been designed
specifically to be integrated with the Greenstone
digital library software, and Greenstone modules have been implemented
for using jSongMiner to build Greenstone collections.
jSongMiner begins by identifying unknown audio files using fingerprinting.
Alternatively, it can also identify songs using metadata queries, either using
metadata that is embedded in an audio file or using known metadata about a
song. Once jSongMiner has identified a song, it can then extract metadata
about the song from various sources, such as from The
Echo Nest, Last.FM and
Brainz web services, or from miscellaneous metadata embedded in the audio
file. In addition to extracting metadata about songs, jSongMiner can also
extract metadata about artists and albums associated with songs as distinct
Once metadata has been extracted relating to a song, artist or album, this
metadata can be saved as an ACE XML 1.1 Classifications
file or as a return-delimited text file, or it can simply be printed to standard
out. Each piece of metadata extracted by jSongMiner includes the metadata
field label, the metadata value and the source from which the metadata was
derived. Users can also opt to have the extracted metadata presented using
unqualified or qualified Dublin Core tags, if desired.
In all, jSongMiner can extract well over 100 song, artist and album fields.
Many of these fields can have multiple values (e.g. there may be multiple
songs similar to a given song).
Like the rest of jMIR, jSongMiner is open-source and available for free.
It is implemented in Java in order to maximize cross-platform utilization.
The one exception to this is its use of the Echo Nest fingerprinting codegen
binary, but the Echo Nest provides versions of this binary for use with Windows,
OS X and Linux. In any case, jSongMiner can certainly be used without the
local fingerprinting functionality offered by the Echo Nest codegen if necessary.
Advantages of jSongMiner
- jSongMiner provides an integrated and unified framework for acquiring many
types of metadata from diverse sources of information about music. It keeps
a record of resource identifiers in as many namespaces as possible while doing
this, thus facilitating the integration of information from different sources.
- jSongMiner provides several different mechanisms for identifying audio files,
so that if one approach fails another may work. jSongMiner also extracts some
of the same metadata fields from different sources redundantly, so that if
a given piece of information is missing or incorrect on one source, it may
still be correctly found on another source.
- jSongMiner permits metadata extraction for both unidentified audio files
and abstract pieces of music for which identifying information (e.g. title,
artist and album) is known.
- jSongMiner keeps track of where each piece of metadata was extracted from.
- jSongMiner provides a command line and configurations file interface that
provides users with a great deal of flexibility as to what they want extracted,
from where it is to be extracted and how the extracted results are to be presented.
- jSongMiner is entirely open-source and free, and is as platform-independent
- jSongMiner has a highly extensible modular architecture that will facilitate
the addition of additional sources of information to the jSongMiner framework
in the future, as well as the incorporation of future improvements to existing
- jSongMiner is designed to be easily integrated into other software, either
through its very well-documented and accessible API, or through the use of
its command line and configuration file interface. jSongMiner also allows
known metadata about a song to be entered at the command line, so that information
known to be true by users or other software can be easily incorporated into
jSongMiner extraction queries.
- jSongMiner is designed specifically to be easily and usefully integrated
into digital library frameworks, and has already been incorporated into the
Greenstone digital library framework.
- jSongMiner provides the option of presenting extracted metadata in the form
of unqualified and/or qualified Dublin Core fields. It includes an original
Dublin Core schema carefully designed to suit the kinds of information that
can be mined from the web and embedded metadata tags.
- jSongMiner is also designed specifically to be used in automatic music classification
research. The integration of jSongMiner into the jMIR framework and its production
of ACE XML files facilitate this. For example, one might use the song identifiers
found by jSongMiner to extract classification features through the web, using
software such as jWebMiner, or one might use
the metadata extracted by jSongMiner itself directly as features.
- jSongMiner provides users with the option of saving extracted metadata as
ACE XML or return-delimited test, and/or have it printed to standard out.
These choices allow the user to choose the output format that is best suited
to their needs (e.g. ACE XML for use in automatic classification, text for
easy parsing at a later date or standard out for immediate access or post-processing
by other software). jSongMiner also includes utility functions for use in
translating and quickly interpreting previously saved metadata.
- jSongMiner allows the storage of metadata containing diverse character sets
by providing the option of URL-encoding saved metadata.
- jSongMiner allows users to choose whether they want saved files about resources
for which metadata has been extracted to be auto-named with content-derived
unique identifiers, or whether command line-specified file names should be
- jSongMiner allows users to treat songs, artists and albums as separate resource
types, and allows information to be extracted and saved independently for
each of them, whilst at the same time maintaining information outlining the
links between resources of the same and different types. Users also have the
option of instead packaging artist and album metadata with song metadata,
if they prefer.
- jSongMiner keeps logs of artists and albums for which metadata has already
been extracted, thus allowing users to avoid repetitively reextracting the
same metadata for artists and albums when a song is being processed that is
associated with an artist or album for which metadata has already been extracted.
- jSongMiner allows simple access to sources of information on the web even
if they must be accessed through a proxy server.
Many more details on jSongMiner are available in the jSongMinerManual.
McKay, C., and D. Bainbridge. 2011. A musical web mining and audio feature extraction extension to the Greenstone digital library software. Proceedings of the International Society for Music Information Retrieval Conference. 459–464.
Questions and Comments
-top of page-