![]() | ![]() |
Home |
|
|
Full-Text Specialty Data Store User's Guide |
|
| Chapter 4 Setting Up Verity Functions |
|
| Creating a Custom Thesaurus (Enhanced Version Only) |
The Verity thesaurus operator expands a search to include the specified word and its synonyms (for information on using the thesaurus operator, see "thesaurus"). In the Enhanced version of the Full-Text Search engine, you can create a custom thesaurus that contains application-specific synonyms to use in place of the default thesaurus.
For example, the default English language thesaurus contains these words as synonyms for "money": "cash," "currency," "lucre," "wampum," and "greenbacks." You can create a custom thesaurus that contains a different set of synonyms for "money"; for example, such as: "bid," "tokens," "credit," "asset," and "verbal offer."
To create a custom thesaurus, follow these steps:
Make a list of the synonyms that you will use with your application. It may help to examine the default thesaurus (see "Examining the Default Thesaurus (Optional)").
Create a control file that contains the synonyms you are defining for your custom thesaurus (see "Creating the Control File").
Create the custom thesaurus using the mksyd utility (see "Creating the Thesaurus"). This uses the control file as input.
Replace the default thesaurus with your custom thesaurus (see "Replacing the Default Thesaurus with the Custom Thesaurus").
For more information on "Custom Thesaurus Support" and the mksyd utility, see the Verity Web site .
In the Enhanced version of Full-Text Search engine, two sample files illustrate how to set up and use a custom thesaurus:
sample_text_thesaurus.ctl is a sample control file
sample_text_thesaurus.sql issues queries against the custom thesaurus defined in the sample control file
These files are in the $SYBASE/$SYBASE_FTS/sample/scripts directory.
A control file contains all the synonym definitions for a thesaurus. To examine the default thesaurus, create its control file using the mksyd utility. Use the syntax:
mksyd -dump -syd $SYBASE/$SYBASE_FTS/verity/common/vdkLanguage/vdk20.syd -f work_location/control_file.ctl
where:
vdkLanguage - is the value of the vdkLanguage configuration parameter (for example, "english")
work_location - is the directory where you want to place the control file
control_file - is the name of the control file you are creating from the default thesaurus
Examine the control file (control_file.ctl) that it creates to view the default synonym lists.
Create a control file that contains the new synonyms for your custom thesaurus. The control file is an ASCII text file in a structured format. Using a text editor (such as vi or emacs), either:
Edit the control file from the default thesaurus and add new synonyms to the existing thesaurus (see "Examining the Default Thesaurus (Optional)"), or
Create a new control file that includes only your synonyms
The control file contains synonym list definitions in a synonyms: statement. For example, the following is a control file named colors.ctl:
$control: 1
synonyms:
{
list: "red, ruby, scarlet, fuchsia,\
magenta"
list: "electric blue <or> azure"
/keys = "lapis"
}
$$The synonyms: statement includes:
list: keywords that specify the start of a synonym list. The synonyms in the list are either in query form or in a list of words or phrases separated by commas.
Each list: can optionally have a /keys modifier that specifies one or more keys separated by commas. In the example above, no keys are specified in the first "list". This means the list is found when the thesaurus is queried for the word "red," "ruby," "scarlet," "fuchsia," or "magenta." The second "list" uses the /keys modifier to specify one key. This means the words or phrases in the list will satisfy a query only when you specify <thesaurus>lapis.
If you use emacs to build a synonym list and any of your lists go beyond one line, turn off auto-fill mode. If you separate your list into multiple lines, include a backslash (\) at the end of each line so that the lines are treated as one list.
For more complex examples of control files, see the Verity Web site.
The mksyd utility creates the custom thesaurus using a control file as input. It is located in:
$SYBASE/$SYBASE_FTS/verity/bin
Run, or define an alias to run, mksyd from this bin directory. Create your custom thesaurus in any work directory.
The mksyd syntax for creating a custom thesaurus is:
mksyd -f control_file.ctl -syd custom_thesaurus.syd
where:
control_file - is the name of the control file you create in Creating the Control File above
custom_thesaurus - is the name of the custom thesaurus you are creating
For example, to execute the mksyd utility reading the sample control file defined above, and directing output to a work directory, use the syntax:
mksyd -f /usr/u/sybase/dba/thesaurus/colors.ctl -syd /usr/u/sybase/dba/thesaurus/custom.syd
The default thesaurus named vdk20.syd is located in:
$SYBASE/$SYBASE_FTS/verity/common/vdkLanguage
where vdkLanguage is the value of the vdkLanguage configuration parameter (for example, the English directory is $SYBASE/$SYBASE_FTS/verity/common/english). Each application and user reading from this location at runtime uses this thesaurus. To replace it with your custom thesaurus, follow these steps:
Back up the default thesaurus before replacing it with the custom thesaurus. For example:
mv /$SYBASE/$SYBASE_FTS/verity/common/english/vdk20.syd default.syd
Replace the vdk20.syd file with your custom thesaurus. For example:
cp custom.syd /$SYBASE/$SYBASE_FTS/verity/common/english/vdk20.syd
Restart your Full-Text Search engine; no configuration file changes are required. The thesaurus is read from this location when the Full-Text Search engine is started, not when a query is executed.
Queries using the thesaurus operator will now use the custom thesaurus.
|
|