jWebMiner is a software package for extracting cultural features from text data on the web using web services. At its most basic level, the software operates by automatically searching the web to acquire statistics on how often particular strings co-occur on the same web pages. This is weighted statistically in a variety of ways based on how often the search strings occur overall.
There is musicological and psychological evidence that cultural associations and expectations outside the scope of the content of musical signals play an essential role in how humans interpret and organize music. This is of particular relevance to music information retrieval (MIR) research in similarity analysis and classification.
jWebMiner begins by parsing either iTunes XML, ACE XML, Weka ARFF or text files in order to access fields and their associated values to use in searches. Users can then either have the software measure the co-occurrence on the web of each value in one field with other values in the same field (e.g., artist similarity analysis), or they can measure the cross tabulation of values in different fields (e.g., genre classification of songs titles). Additional strings to include in all searches and analyses can also be added.
Filter terms can also be used. The software can be set to ignore all web pages that do not contain words such as “music” or “is similar to”, for example. It is also possible to set jWebMiner to simply perform general web searches, or to limit searches or assign higher weights to particular sites, such as the All Music Guide, Wikipedia, music review blogs and columns, lyrics repositories, sites storing playlists of individuals or of radio stations, sites storing track listings of compilation albums, etc.
jWebMiner can perform web searches using web services offered by either Yahoo! or Google. There are future plans to expand this to include additional types of web services, such as those offered by Amazon, and an extensible architecture has been implemented to facilitate this.
New Last.FM tag mining in jWebMiner 2.0
jWebMiner 2.0 now includes the option to automatically extract features based on Last.FM Artist social tags. These can be extracted and saved either alone or in combination with the web search engine results described above.
Additional functionality beyond what is described here is available outlined in jWebMiner's manual. Note that this is the original jWebMiner 1.0 manual, and has not been updated to reflect the changes in the new jWebMiner 2.0 yet. The changes are relatively intuitive to understand.
McKay, C. 2010. Automatic music classification with jMIR. Ph.D. Thesis. McGill University, Canada.
McKay, C., J. A. Burgoyne, J. Hockman, J. B. L. Smith, G. Vigliensoni, and I. Fujinaga. 2010. Evaluating the genre classification performance of lyrical features relative to audio, symbolic and cultural features. Proceedings of the International Society for Music Information Retrieval Conference. 213–8.
McKay, C., and I. Fujinaga. 2010. Improving automatic music classification performance by extracting features from different types of data. Proceedings of the ACM SIGMM International Conference on Multimedia Information Retrieval. 257–66.
Vigliensoni, G., C. McKay, and I. Fujinaga. 2010. Using jWebMiner 2.0 to improve music classification performance by combining different types of features mined from the web. Proceedings of the International Society for Music Information Retrieval Conference. 607–12.
McKay, C., and I. Fujinaga. 2008. Combining features extracted from audio, symbolic and cultural sources. Proceedings of the International Conference on Music Information Retrieval. 597–602.
McKay, C., and I. Fujinaga. 2007. jWebMiner: A web-based feature extractor. Accepted for publication at the 2007 International Conference on Music Information Retrieval.
Questions and Comments
DOWNLOAD FROM SOURCEFORGE
-top of page-