ACE Manual |
---|
In this manual:
What is ACE?
ACE (Autonomous Classification Engine) is a meta-learning software package for selecting, optimizing and applying machine learning algorithms to music research. Given a set of feature vectors, ACE experiments with a variety of classifiers, classifier parameters, classifier ensemble architectures and dimensionality reduction techniques in order to arrive at a good configuration for the problem at hand. This can be important, as different algorithms can be appropriate for different problems and types of data. ACE is designed to increase classification success rates, facilitate the application of powerful machine learning technology for users of all technical levels and provide a framework for experimenting with new algorithms.
ACE evaluates different configurations in terms of success rates, stability across trials, training times and classification times. Each of these factors may vary in relevance, depending on the goals of the particular task under consideration. Functionality is also being incorporated into ACE allowing users to specify constraints on the amount of time that ACE has to arrive at an appropriate algorithm selection.
ACE may also be used directly as a classifier. Once users have selected the classifier(s) that they wish to use, whether through meta-learning or using pre-existing knowledge, they need only provide ACE with feature vectors and model classifications. ACE then trains itself and presents users with trained models.
ACE is specifically designed to facilitate classification for those new to pattern recognition, both through its use of meta-learning to help inexperienced users avoid inappropriate algorithm selections and through its intuitive GUI (currently under development). ACE is also designed to facilitate the research of those well-versed in machine learning, and includes a command-line interface and well-documented API for those interested in more advanced batch use or in development.
ACE is built on the standardized Weka machine learning infrastructure, and makes direct use of a variety of algorithms distributed with Weka. This means that not only can new algorithms produced by the very active Weka community be immediately incorporated into ACE, but new algorithms specifically designed for MIR research can be developed using the Weka framework. ACE can read features stored in either ACE XML or Weka ARFF files.
For a detailed description of the organization of ACE's main processing classes, please see the Ace Structure page.
How to use ACE from the Command Line
ACE has four main utilities: ACE can be used to train a Weka Classifier object, to classify a data set with a previously trained Weka Classifier, to cross validate a data set with a specific type of Weka Classifier, or to experiment on a data set (to experiment is to test a variety of Classifiers in order to find the optimal classification approach). In addition to this main functionality, ACE also has the ability to specify many other options. All of this functionality is accessible through the command line. A complete list of all command line flags appears at the end of this document. Note that ACE will run from the GUI if no processing commands are specified; -train -classify -cv or -exp must be present in the command line arguments for ACE to run from the command line.
Loading
For any processing to occur, data must be loaded. Data can be in ACE XML format or Weka ARFF format. Individual ACE XML files can be loaded using the -ltax (load taxonomy), -lfkey (load feature key), -lfvec (load feature vectors), and -lmclas (load model classifications) flags at the command line. Weka ARFF files are loaded using the -arff flag. ACE is able to load a previously saved ACE project from an ACE Project File or an ACE Zip File using the -proj and -lzip flags respectively. Instances may only be loaded using one method at a time; Instances must be from either individual ACE XML files, an ARFF file, an ACE project file, or an ACE zip file. No combination of these methods is permitted.
Examples:
Note: These examples don't have any processing commands and would therefore open the GUI instead of running from the command line.
- java -jar ACE.jar -arff iris.arff
- java -jar ACE.jar -ltax tax.xml -lfkey fdefs.xml -lfvec fvecs.xml -lmclas classes.xml
- java -jar ACE.jar -lzip myACE.zip
Training
Training is specified with the -train flag. The -train flag itself takes no option. The user must specify the type of Weka Classifier to be trained with the -learner flag and the name of the file to which to save the trained Classifier should be saved with the -sres flag. Both of these additional flags and their associated options must be present in order for training to occur. The type of Classifier is indicated using the following codes:
Feature vectors, a feature key, and model classifications (but not necessarily a taxonomy) must be loaded in order to train.
- Unweighted k-nn (k = 1): IBk
- Naive Bayesian (Gaussian): NaiveBayes
- Support Vector Machine: SMO
- C4.5 Decision Tree: J48
- Backprop Neural Network: MultilayerPerceptron
- AdaBoost seeded with C4.5 Decision Trees: AdaBoostM1
- Bagging seeded with C4.5 Decision Trees: Bagging
Examples:
- java -jar ACE.jar -arff iris.arff -train -learner ibk -sres test.model
- A k-nn Classifier will be trained and saved in a file called test.model.
- java -jar ACE.jar -lfkey fdefs.xml -lfvec fvecs.xml -lmclas classes.xml -train -sres machine.model -learner smo
- A support vector machine Classifier will be trained and saved in a file called machine.model.
Classifying
Classifying is specified with the -classify flag. A previously trained and saved Classifier must be specified as an option to this command. Feature definitions, feature vectors, and a taxonomy must be loaded. Success rates can only be printed if model classifications are loaded. Classifications can be written to a Weka ARFF file or an ACE XML classifications file (depending on which format was used to load the initial instances).
Examples:
The Classifiers from the previous set of examples are now being tested on the same data sets with which they were trained.
- java -jar ACE.jar -arff iris.arff -classify test.model
- java -jar ACE.jar -lfkey fdefs.xml -lfvec fvecs.xml -lmclas classes.xml -classify machine.model
Cross Validating
Cross validation is specified with the -cv flag. The user must specify the number of folds to be used during cross validation (at least 2). During cross validation, instances are randomly partitioned into training and testing sets for each fold. Each instance is a testing instance for only one fold and is a member of the training set otherwise. For each fold, the training set is used to train the specified type of Classifier which is then tested on the testing set. Statistics are calculated per fold and used to prepare a report outlining the success of this classification approach overall. Types of Classifiers may be specified with the following codes (note that Classifier parameters are preset, plans exist to expand ACE to accept parameter values at the command line, however, at this time, only default values are available).
Examples:
- Unweighted k-nn (k = 1): IBk
- Naive Bayesian (Gaussian): NaiveBayes
- Support Vector Machine: SMO
- C4.5 Decision Tree: J48
- Backprop Neural Network: MultilayerPerceptron
- AdaBoost seeded with C4.5 Decision Trees: AdaBoostM1
- Bagging seeded with C4.5 Decision Trees: Bagging
- java -jar ACE.jar -lzip ziptest.zip -cv 3 -learner smo
- A three fold cross validation is performed with support vector machine Classifiers
- java -jar ACE.jar -lfkey fdefs.xml -lfvec fvecs.xml -lmclas classes.xml -cv 4 -learner j48
- A four fold cross validation is performed with C4.5 decision tree Classifiers
Experimenting
Experimentation is specified with the -exp flag. Only number of cross validation folds need to be specified. Experimentation tests a variety of different classification techniques in order to find the optimal approach. The given instances will be tested with four different types of feature selection: Principal Components, Exhaustive search using naive Bayesian Classifier, Genetic search using naive Bayesian Classifier, and no feature selection. For each of the four sets of dimensionality reduced Instances, thirty-seven different types of Classifiers will be tested using cross validation. Best classification approaches will be determined for each set of dimensionality reduced Instances and overall by comparing error rates. Once the best classification approach has been selected, a validation test is performed. A new Classifier of the chosen type is created and trained on the chosen type of dimensionality reduced instances. This validation Classifier is tested on a publication set of instances that was set aside at the beginning of the experiment. Validation results are printed along side a copy of the cross validation results for the best classification.
Examples:The ACE zip file allows for an entire ACE project, including all component ACE XML files, to be stored in a single file. The ACE command line provides utilities for creating new ACE zip files, extracting files from a previously saved zip file, and for adding and extracting individual files.
- java -jar ACE.jar -proj myproject.xml -exp 3
- java -jar ACE.jar -lfkey fdefs.xml -lfvec fvecs.xml -lmclas classes.xml -exp 5
When performing zip file operations from the command line, the -zipfile flag must always be present with a single option specifying the name of the zip file to be accessed or created. If the specified file already exists, ACE will overwrite it without warning. To create a new zip file, use the -dozip flag. All command line arguments that aren't associated with any other flags will be assumed to be files or directories that the user wishes to compress into a zip file. To add any number of files or directories to an existing zip file, use the -zip_add flag. If any of the files to be added are ACE XML files, they will be added to the ACE XML project file for that zip file. (**Please note that this process does not occur efficiently; all files previously contained in the zip file will be extracted and re-compressed with the newly added files.) To extract the contents of a zip file, use the -unzip flag. By default, the files will be extracted into a directory with the same name as the zip file (without the extension). The user may choose to extract a single file or a single filetype from an existing zip file. Using the -zip_extract flag, the user can specify a single file to be extracted. If the -filetype flag is present, ACE will extract all files of the specified filetype from the given ACE zip file. Recognized filetypes include: "project_file", "taxonomy_file", "feature_key_file", "feature_vector_file", or "classifications_file". To specify a specific directory into which the contents of the zip file should be extracted, use the -zip_dir flag. Please note that only one of -unzip, -dozip, -zip_add, or -zip_extract may be specified at one time.
Examples:Please visit the ACE zip file page for a more detailed description of the structure of ACE zip files.
- java -jar ACE.jar -zipfile myproject.zip -dozip myproject
- The contents of the directory "myproject" are being compressed into an ACE zip file called "myproject.zip"
- java -jar ACE.jar -zipfile myproject.zip -zip_add fvec
Other Options
- Max Class Membership Spread - This can be specified with the -max_spread flag and can be used whenever training is occurring. The given value will be the maximum ratio of instances that are permitted to belong to different classes. For example, a value of 2 means that only up to twice the number of instances belonging to the class with the smallest number of training instances may be used for training. If a class has more training instances than this number, then a randomly selected set of instances up to the maximum are selected for use in training and all others are eliminated. A value of 0 means that no maximum spread is enforced and a value of 1 enforces a uniform distribution. Instances may be reordered. Value defaults to 0 if not specified at the command line.
- Max Class Membership Count - This can be specified using the -max_memb flag and can be used whenever training is occurring. The given value will set the maximum number of instances that may belong to any one class. If a class has more training instances than this number, then a randomly selected set of instances up to the maximum are selected for use in training, and all others are eliminated. A value of 0 means that no maximum is enforced. Value defaults to 0 if not specified at the command line.
- Order Randomly - Whether or not to randomly order the training instances.
- Verbose - If -verbose is included in the command line arguments, extra detail about the processing being performed will be included in the results. If dimensionality reduction is being performed, detailed information about the attribute selection will be included. If cross validation is being performed, the partitioning, model classification, and predicted classification (if applicable) of each instance will be included. An asterisk will also appear next to instances that were incorrectly classified.
Command Line Flags
GENERAL OPTIONS:
-help: Display a guide to this utility. No option is needed. All possible flags will be listed in a format similar to this one.
LOADING OPTIONS:
-proj: Automatically load the ACE project file specified by the single option. -lzip: Load an ACE project from the specified ACE zip file. The ACE zip file will be extracted into a default temporary directory unless another directory is specified with the -zip_dir flag. If running the ACE GUI, it will be loaded with a blank project if no initial project is specified with the -proj or -zip flag. -ltax: Load the specified taxonomy_file XML file. -lfkey: Load the specified feature_key_file XML file. lfvec: Load the specified feature_vector_file XML file(s). -lmclas: Load the specified classifications_file XML file(s). -arff: Load training or testing data from an ARFF file instead of XML files(s). Note that it is assumed that the class attribute is the last attribute.
TRAINING OPTIONS:
-train: Train the given type of classifier and save it in the given file. The -train flag itself takes no options. -learner: Mandatory flag that specifies the type of Classifier to be trained. Types of classifiers can be specified in accordance to the following codes:
- Unweighted k-nn (k = 1): IBk
- Naive Bayesian (Gaussian): NaiveBayes
- Support Vector Machine: SMO
- C4.5 Decision Tree: J48
- Backprop Neural Network: MultilayerPerceptron
- AdaBoost seeded with C4.5 Decision Trees: AdaBoostM1
- Bagging seeded with C4.5 Decision Trees: Bagging
-sres: Mandatory flag that specifies the path name of the file to which to save the trained Classifier. -dr: Takes single option specifying the type of dimensionality reduction to be performed. If null, no dimensionality reduction will be performed. Codes for feature selectors are as follows:
- Principal Components: PCA
- Exhaustive search using naive Bayesian classifier: EXB
- Genetic search using naive Bayesian classifier: GNB
-sarff: Saves training data to an ARFF file after parsing, after thinning and again after feature selection, if any. Useful for testing. -max_spread: The maximum ratio between the number of training instances belonging to any class compared to the least populous class. This will be set to 0.0 if not specified otherwise. -max_memb: The maximum number of training instances that may belong to each class. This will also be set 0.0 if not specified otherwise. -rand_ord: The presence of this flag causes training instances to be randomly reordered. -verbose: Detailed information about the dimensionality reduction that was performed will be printed.
CLASSIFYING OPTIONS:
-classify: Perform classifications using a trained classifier. Takes the path name of a saved classifier as its option. -sres: Save the test results in an ACE XML classifications file or an ARFF file, depending on the filetype of the input data. -sarff: Saves testing data to an ARFF file after parsing and again after feature selection, if any. Useful for testing.
CROSS-VALIDATING OPTIONS:
-cv: Perform a cross validation. Must specify number of folds as option. -learner: Mandatory flag that specifies what type of Classifier to use for cross validation. Classifier types are specified according to the following codes:
- Unweighted k-nn (k = 1): IBk
- Naive Bayesian (Gaussian): NaiveBayes
- Support Vector Machine: SMO
- C4.5 Decision Tree: J48
- Backprop Neural Network: MultilayerPerceptron
- AdaBoost seeded with C4.5 Decision Trees: AdaBoostM1
- Bagging seeded with C4.5 Decision Trees: Bagging
-fs: Takes single option specifying the type of dimensionality to be performed. If null, no feature selection will be performed. Codes for feature selectors are as follows:
- Principal Components: PCA
- Exhaustive search using naive Bayesian classifier: EXB
- Genetic search using naive Bayesian classifier: GNB
-sres: Saves results in a text file with the given name. If not present, results are only printed to standard out. -sarff: Saves testing data to an ARFF file after parsing and again after feature selection, if any. Useful for testing. -max_spread: The maximum ratio between the number of training instances belonging to any class compared to the least populous class. This will be set to 0.0 if not specified otherwise. -max_memb: The maximum number of training instances that may belong to each class. This will also be set 0.0 if not specified otherwise. -rand_ord: The presence of this flag causes training instances to be randomly reordered. -verbose: The results for the partitioning and classification of each individual instance is printed and saved as well as detailed information about the dimensionality reduction that was performed. Incorrect classifications are marked with an asterisk.
EXPERIMENTATION OPTIONS:
-exp: Perform a cross-validation and output the results to standard out. Specifies the number of cross-validation folds. -sres: Saves results in files with the given base file name. If not present, results are saved with default base file name -max_spread: The maximum ratio between the number of training instances belonging to any class compared to the least populous class. This will be set to 0.0 if not specified otherwise. -max_memb: The maximum number of training instances that may belong to each class. This will also be set to 0.0 if not specified otherwise. -rand_ord: The presence of this flag causes training instances to be randomly reordered. -verbose: Detailed information about the dimensionality reduction that was performed.
ZIP UTILITIES:
-zipfile: Specifies the path to the ACE zip file that is to be created or accessed. When decompressing (-unzip or -zip_extract), this will be the name of the ACE zip file from which to extract. When compressing (-dozip or -zip_add), this will be the name of the new zip file to be created or the previously existing zip file to which to add. This flag is required for all zip file processing operations. -dozip: Compresses the given files into an ACE zip file. The -zipfile flag is required. The rest of the arguments given after this and all other ACE command line flags will be assumed to be files or directories to be included in the new ACE zipfile. -unzip: No option is required for this flag. The contents of the zip file specified with the -zipfile flag will be extracted into a default directory unless the -zip_dir flag is present. -zip_add: Specifies a list of 1 or more files and/or directories to be added to a previously existing zip file. -zip_extract: Specifies a single file to be extracted from a previously existing zip file. If the -filetype flag is present, the option of -zip-extract will be ignored. -filetype: Specifies the type of ACE XML file to be extracted from a previously existing zip file. This flag may only be used in conjunction with the -zip_extract flag. -zip_dir: Optional flag to specify the directory into which the contents of a zip file should be extracted. This flag can be used when either the -unzip or -zip_extract flags are present.
How to use ACE from the Graphic User Interface
The ACE GUI is currently under construction. Currently is serves as a tool for viewing and editing ACE XML files. Eventually, the GUI will be able to be used to perform experiments on data sets. It is divided into six panes: Taxonomy, Features, Instances, Preferences, Classification Settings, and Experimenter. The panes Taxonomy, Features, and Instances allow the user to view ACE XML files. Currently the panes Preferences, Classifications Settings, and Experimenter are empty. In the near future, the Experimenter pane will provide the ability to perform experiments.Loading
By default, when the ACE GUI starts it will load an empty project. There are multiple ways to load data into ACE.
- From the command line: Using the -lzip or -prof flags at the command line, the user can specify either an ACE project file or an ACE zip file from which he/she would like to load data. For example: one might type "java -jar dist/ACE.jar -lzip myzip.zip" at a command line prompt to load the ACE GUI with the data contained in the ACE zip file "myzip.zip". The user may also specify specific ACE XML files to load using the same flags that are used when running ACE from the command line (-ltax, -lfvec -lfkey, -lmclas).
- Load Zip File menu item: In the File menu, there is an option to "Load Zip". This will allow the user to specify an ACE zip file (or ACE project file) from which to load data.
- Load Configuration Files dialog box: Also in the File menu, is the option to "Load Configuration Files". This will open the project files dialog box which allows the user to specify many data files in a number of ways. The user may specify an ACE project file or an ACE zip file, he/she may specify specific ACE XML files, or the user can even specify an ARFF file from which to load data.
- Within the individual datatype panels: Within the panels for each type of ACE datatype, there are tools for loading, creating, and editing ACE XML files.
Viewing and Editing
Currently, the main functionality of the ACE GUI is to view and edit ACE XML files. The GUI is divided into Panels that are designed to represent the same information that is stored in ACE XML files. Within these panels one can easily load, modify, and save data.A valuable functionality of the ACE GUI is its ability to easily convert Weka ARFF files into ACE XML files. Once an ARFF file is loaded into the GUI from the Load Configuration Files dialog box, the data is stored internally as ACE datatypes. This data can then easily be saved in ACE XML files.
Experimentation
Currently no Experimentation is possible with the ACE GUI. In the near future, the Experimenter Panel will contain tools for running and viewing the results of experiments.
Related Publications
McKay, C., and I. Fujinaga. 2007. Style-independent computer-assisted exploratory analysis of large music collections. Journal of Interdisciplinary Music Studies 1 (1): 63–85.
McKay, C., R. Fiebrink, D. McEnnis, B. Li, and I. Fujinaga. 2005. ACE: A framework for optimizing music classification. Proceedings of the International Conference on Music Information Retrieval. 42–9.
McKay, C., D. McEnnis, R. Fiebrink, and I. Fujinaga. 2005. ACE: A general-purpose classification ensemble optimization framework. Proceedings of the International Computer Music Conference. 161–4.
Sinyor, E., C. McKay, R. Fiebrink, D. McEnnis, and I. Fujinaga. 2005. Beatbox classification using ACE. Proceedings of the International Conference on Music Information Retrieval. 672–5.
Questions and Comments