Options Panel

The Options Panel is a GUI panel allowing users to select options affecting the details of feature extraction. This panel is divided into two sections. The left section allows users to select the web service(s) to submit queries to and to set query preferences that affect the types and numbers of hits returned by queries. The right section allows users to set the details of how final feature values will be calculated from raw hit counts and which reports will be displayed in the Results Panel after feature extraction is complete. A screenshot of the Options Panel is shown in Figure 1.

Figure 1: A screenshot of the Options Panel showing default settings.

This section of the manual is divided into seven sections, one for each of the seven sections of the Options Panel interface:

1) WEB SERVICES TO SEARCH

The options in this section allow the user to control which web services to submit queries to. Passwords to use the services may also be entered here, if appropriate.

Although multiple web services may be selected, this will increase the duration of feature extraction compared to only using one web service. The hit counts for each selected web service are combined during final feature calculation. The limitations of each web service are described in the Hints and Suggestions section of this manual.

It may be useful for those with software development backgrounds to implement additional web services, as described in the Extending the Software section of this manual.

The specific options in this section are as follows:

2) GENERAL SEARCH SETTINGS

The options in this section allow the user to control miscellaneous settings influencing how searches are performed and what types of results are returned.

It is in general best for users to leave these preferences as they are unless they are expert users. This is because these controls can have significant effect on returned hit counts and because not all of these options are supported by all web services.

The specific options in this section are as follows:

3) LANGUAGE, REGIONAL AND FILE TYPE FILTERS

The options in this section allow the user to control filters (of a different type than those in the Required Filter Words Panel and Excluded Filter Terms Panel) that limit the types of hits that can be returned. The geographical region that searches are performed in can also be specified. These options must be chosen from the available options in the provided combo boxes. A choice of "No Limitations" means that the corresponding filter is not applied.

The specific options in this section are as follows:

4) FEATURE SCORE CALCULATION SETTINGS

The options in this section control whether various types of normalization are applied during final feature calculation. None, some or all of these normalizations can be applied. The specific options in this section are as follows:

5) CO-OCCURRENCE SCORING FUNCTION

The section controls which formula is used to calculate final feature values from hit counts when the Co-Occurrence Extraction option is selected in the Search Words Panel. These formulas are applied to hit counts after hit counts have been combined across sources (as set in the Site Weightings Panel) and across web services (as set in the WEB SERVICES TO SEARCH section of the Options Panel). These formulas are also applied after source weightings (as set in the Site Weightings Panel) and normalizations (as set in the FEATURE SCORE CALCULATION SETTINGS section of the Options Panel) have been used to process hit counts. The exception to this is the Normalize feature settings normalization, which is applied after the chosen formula has been applied.

Which formula is best to use depends on the kind of search that is being performed, and it can be useful to experiment with different formulas. It may also be useful for those with software development backgrounds to implement additional formulas, as described in the Extending the Software section of this manual.

The specific options in this section are given below as well as references to publications that detail their previous use.

The notation used in the above formulas is defined as follows:

6) CROSS TABULATION SCORING FUNCTION

The section controls which formula is used to calculate final feature values from hit counts when the Cross Tabulation Extraction option is selected in the Search Words Panel. These formulas are applied to hit counts after hit counts have been combined across sources (as set in the Site Weightings Panel) and across web services (as set in the WEB SERVICES TO SEARCH section of the Options Panel). These formulas are also applied after source weightings (as set in the Site Weightings Panel) and normalizations (as set in the FEATURE SCORE CALCULATION SETTINGS section of the Options Panel) have been used to process hit counts. The exception to this is the the Normalize feature settings normalization, which is applied after the chosen formula has been applied.

Which formula is best to use depends on the kind of search that is being performed, and it can be useful to experiment with different formulas. It may also be useful for those with software development backgrounds to implement additional formulas, as described in the Extending the Software section of this manual.

The specific options in this section are given below as well as references to publications that detail their previous use.

The notation used in the above formulas is defined as follows:

7) INFORMATION TO REPORT

The options in this section control which types of reports are generated and displayed in the Results Panel after feature extraction is complete. These reports are each presented in tables labeled by search strings (except for the Search settings used report).

All, some or none of these reports may be selected, but regardless of the reports selected here feature values are also stored after extraction so that they can be saved in the Results Panel as ACE XML, Weka ARFF or newline delimited text files. Reports are displayed in the Results Panel in the same order that they appear here in the Options Panel. These reports can be useful in debugging and/or understanding why feature scores are as they are, as they can be used to view processing at various intermediate stages.

Note that in the case where the Co-Occurrence Extraction option is selected in the Search Words Panel that entries on the diagonal of each report table are left empty in each of the reports and are set to 0 if saved, as they are not significant.

The specific options in this section are as follows:

-top of page-