Options Panel Processing Options

This section of the Options Panel allows the user to control the parameters of the processing that will be used to detect metadata errors in a music collection. These options are found on the left side of the Options Panel. This section is itself divided into four sub-sections:

1) Find/Replace Settings

This set of preferences allows the user to decide which find/replace operations are to be performed as part of the process of detecting probable metadata errors. All of the options in the Find/Replace section will be bypassed if the Perform find/replace operations option is deselected.

One of the primary functions of jMusicMetaManager is to detect different field values that should in fact be the same. As an example of how find/replace operations can be useful, artist field values of “REM” and “R.E.M.” should likely be identical. Having the Remove all periods option enabled would cause all relevant metadata fields to have all periods removed, thus making it possible to detect this probable error.

All operations in this section are performed in the order that they are listed and are cumulative. For example, the Convert “ and ” to “ & ” option will convert “ And ” to “ & ” if (and only if) the Treat upper/lower case as identical option is selected as well. This is because this latter option occurs before the Convert “ and ” to “ & ” option (“A” must first get converted to “a”).

The operations in this Find/Replace Settings section are performed before any reordered word subset operations (with one exception described below) or edit distance operations selected in the Reordered Word Subset Settings and Edit Distance Settings sections respectively. Find/replace operations are also carried through to these two groups of processing tasks. So, for example, if spaces are removed because the Remove all spaces option is selected, then edit distances will all be calculated on field entries that have had spaces removed.

Also, note that any field values (other than titles) found to be identical during find/replace processing will be merged before proceeding to reordered word subset or edit distance processing.

The options in this section are only relevant to the reports in the Reports Describing Detailed Processing Results and Summaries of Probable Errors in Metadata sections of the Report Panel. All other reports are generated before any find/replace operations are performed.

These find/replace operations are performed on the Title, Artist, Composer, Album and Genre fields. However, each of these fields will be excluded if a report option corresponding to a given field is not selected in the Summaries of Probable Errors in Metadata section.

Details on the results of find/replace processing can be viewed in the Fields differing only in case, Detailed replacements made and Newly identical fields after find and replace reports of the Report Panel.

The specific options in this section are as follows:

2) Reordered Word Subset Settings

This set of preferences allows the user to decide which reordered word subset operations are to be performed as part of the process of detecting probable metadata errors, as well as what parameters are to be used.

One of the primary functions of jMusicMetaManager is to detect different field values that should in fact be the same. As an example of how reordered word subset operations can be useful, artist field values of “Ella Fitzgerald” and “Fitzgerald Ella” should likely be identical, as should “Duke Ellington” and “Duke Ellington & His Orchestra”. Reordered word subset operations are useful in detecting cases where one field value should likely be identical to another, but where the words in one value are a subset of the words in the other and/or the words are out of order.

Reordered word subset operations operate by first tokenizing fields into words and then performing appropriate comparisons. These operations treat each word as a single unit, and cannot be used to detect errors within words. The edit distance operations in the Edit Distance Settings section are useful for this latter type of error. Having processing that treats words as fixed units as well as processing that treats each character separately makes jMusicMetaManager more flexible, and has been found experimentally to provide better results in terms of false positives when searching for probable metadata errors.

The operations in this section are performed in the order that they are listed and are cumulative. They are performed after all find/replace operations described in the Find/Replace Settings section have been performed, with the notable exception of the Remove all spaces operation, which is performed immediately after the operations in this section if it is selected. Reordered word subset operations are performed before any selected edit distance operations in the Edit Distance Settings section, so any field values found to be identical at this stage of processing (other than titles) will be merged before proceeding to edit distance calculations. Similarly, the field values processed during reordered word subset operations reflect the results of all find/replace operations, and all fields found to be identical during find/replace operations (other than titles) will already have been merged before they undergo reordered word subset processing.

The options in this section are only relevant to the reports referred to in the Reports Describing Detailed Processing Results and Summaries of Probable Errors in Metadata sections found on the right side of the Options Panel. All other reports are generated before reordered word subset operations are performed.

Reordered word subset operations operations are performed on the Title, Artist, Composer, Album and Genre fields. However, each of these fields will be excluded if a report option corresponding to a given field is not selected in the Summaries of Probable Errors in Metadata section on the right side of the Options Panel.

Details on the results of reordered word subset operations processing can be viewed in the Fields with scrambled word orderings and Fields whose words are subsets of another reports found in the Report Panel.

The specific options in this section are as follows:

3) Edit Distance Settings

This set of preferences allows the user to decide which edit distance settings are to be used as part of the process of detecting probable metadata errors.

Edit distance, also known as Levenshtein distance, is a measure of the difference between two strings that is often used in information theory and computer science. It is defined as the minimum number of operations needed to transform one given string into another given string. A single operation can consist of inserting, deleting or substituting a single character.

One of the primary functions of jMusicMetaManager is to detect different field values that should in fact be the same. Edit distance is a particularly useful way of detecting differences in spelling, due either to misspellings, as in the case of “Lynyrd Skynyrd” and “Leonard Skinard”, for example, or due to multiple valid spellings of names uch as "Stravinski".

The operations in this section are performed in the order that they are listed and are cumulative. They are also performed after all find/replace and reordered word subset operations described in the Find/Replace Settings and Reordered Word Subset Settings sections have been performed. The field values for which edit distances are found therefore reflect the results of all find/replace operations, and all field values (except titles) found to be identical during find/replace or reordered word subset operations will already have been merged before they undergo edit distance processing.

The options in this section are only relevant to the reports in the Reports Describing Detailed Processing Results and Summaries of Probable Errors in Metadata sections of the right side of the Options Panel. All other reports are generated before edit distance operations are performed.

Edit distance operations are performed on the Title, Artist, Composer, Album and Genre fields. However, each of these fields will be excluded if a report option corresponding to a given field is not selected in the Summaries of Probable Errors in Metadata section of the right side of the Options Panel.

Details on the results of edit distance calculations can be viewed in the Edit distances report of the Report Panel, although it is not recommended that this report be generated for large music collections, as the creation of this report can require significant time and memory. Edit distance calculations themselved can certainly still be performed even when this report is not generated.

The specific options in this section are as follows:

4) Duplicate Recording Detection Settings

One of the primary functions of jMusicMetaManager is to detect redundant multiples of the same recording. The preferences in this section make it possible for the user to control the paramters of the processing used to prevent false redundancies from being reported.

It is expected that metadata fields such as Artist, Composer, Genre and Album will include multiples of the same field value. For example, all tracks in a given album should have the same value for the Album field. jMusicMetaManager therefore puts an emphasis on finding values of such fields that should be the same but are not.

The situation is somewhat different for the Title field, however. Although it is also important to find titles that should be the same but are not, it is also important to give special attention to titles that do in fact have the same name. Such duplicates could be due to different versions of the same song, which might be desirable, or they could be due to identical copies of the same recording, which might not be desirable.

It is therefore useful to further consider titles that have the same or similar names rather than just merging them, as is done for the Artist, Composer, Genre and album fields. jMusicManager constructs a list of all clusters of recordings with the same or similar titles, and then filters them based on the settings of this section. The goal of this is to report only those similar titles that are close enough in specified ways to likely be redundant multiples of the same recording.

The filters in this section are applied in the order that they are listed and are cumulative. They are applied after all selected operations in the in the Find/Replace Settings, Reordered Word Subset Settings and Edit Distance Settings sections have been performed. The clusters of titles that are the same or likely to be the same are constructed as find/replaced, reordered word subset and edit distance operations are performed.

The options in this section are only relevant to the Probable duplicates of the same recording report in the Report Panel.

The specific options in this section are as follows:

-top of page-