Sybase Technical Library - Product Manuals Home
[Search Forms] [Previous Section with Hits] [Next Section with Hits] [Clear Search] Expand Search

Chapter 3: Configuring Adaptive Server for Full-Text Searches [Table of Contents] Chapter 5: Writing Full-Text Search Queries

Full-Text Search Specialty Data Store User's Guide

[-] Chapter 4: Setting Up Verity Functions

Chapter 4

Setting Up Verity Functions

This chapter describes the setup required before you can write queries with certain Verity functionality. It includes:

Enabling Query-By-Example, Summarization, and Clustering

The style.prm file specifies additional data to include in the text indexes to support the following functionality:

You can enable these features for all text indexes by editing the master style.prm file, or you can enable them for an individual text index by editing its style.prm file. Both methods are describe below.

Query-By-Example and Clustering

To use phrases in a query-by-example specification and to use clustering, you must enable the generation of document feature vectors at indexing time. To do this, uncomment the following line in the style.prm file:

$define DOC-FEATURES "TF"

Summarization

To configure the Full-Text Search engine for summarization, uncomment one of the following lines that starts with "#$define" in the style.prm file:

# The example below stores the best three sentences of
# the document, but not more than 255 bytes.
#$define DOC-SUMMARIES "XS MaxSents 3 MaxBytes 255"
# The example below stores the first four sentences of
# the document, but not more than 255 bytes.
#$define DOC-SUMMARIES "LS MaxSents 4 MaxBytes 255"
# The example below stores the first 150 bytes of
# the document, with whitespace compressed.
#$define DOC-SUMMARIES "LB MaxBytes 150"

Each of those lines reflects a different level of summarization. You can specify how many bytes of data you want the Full-Text Search engine to display, by altering the numbers at the ends of these lines. For example, if you want only the first 233 bytes of data summarized, edit the script to read:

$define DOC-SUMMARIES   "LS MaxSents 4 MaxBytes 233"

The maximum number of bytes displayed is 255. Any number greater than that is truncated to 255.

Editing the Master style.prm File

The master style.prm file is located in:

$SYBASE/sds/text/verity/common/style

It contains the default Full-Text Search engine style parameters. Edit this file to configure the Full-Text Search engine so that all tables on which you create text indexes allow clustering and literal text in your query-by-example specifications, or summarization. Uncomment the applicable lines as described above.

Note: If you have existing text indexes, you must re-create the text index with these features enabled as described in "Editing Individual style.prm Files" below.

Editing Individual style.prm Files

Perform the following steps to configure the Full-Text Search engine so that the individual text index allows clustering and literal text in your query-by-example specifications, or summarization:

  1. Create the text index using sp_create_text_index. Use the word "empty" in the option_string parameter so that the style.prm file is created for the text index, but the Verity collections are not populated with data. For example, if you are enabling clustering for the copy column of the blurbs table, use the following syntax:

    sp_create_text_index "KRAZYKAT", "i_blurbs", "blurbs", "empty", "copy"
    Note: If the text index already exists, omit this step. You do not need to create the text index again.
  2. Use sp_drop_text_index to drop the text index associated with the style.prm file you are editing.

    For example, to drop the text index created in step 1, enter:

    sp_drop_text_index "blurbs.i_blurbs"
  3. Edit the style.prm file that exists for the text index. The style.prm file for an existing collection is located in:

    $SYBASE/sds/text/collections/db.owner.index/style

    where db.owner.index is the database, the database owner, and the index created with sp_create_text_index. For example, if you create a text index called i_blurbs on the pubs2 database, the full path to these files is:

    $SYBASE/sds/text/collections/pubs2.dbo.i_blurbs/style

  4. Uncomment the applicable lines as described above.

    For example, to enable clustering, uncomment the following line:

    $define DOC-FEATURES "TF"
  5. Re-create the text index you dropped in step 2. For example, to re-create the i_blurbs text index, enter:

    sp_create_text_index "KRAZYKAT", "i_blurbs", "blurbs", "", "copy"

Setting Up a Column to Use As a Sort Specification

Before you can sort by specific columns, you must modify the style.vgw and style.ufl files. (For information on including a column in a sort specification, see "Using the sort_by Column to Specify a Sort Order".) Both files are in the directory:

$SYBASE/sds/text/collections/db.owner.index/style

where db.owner.index is the database, the database owner, and the index created using sp_create_text_index. For example, if you created a text index called i_blurbs on the pubs2 database, the full path to those files would be similar to:

$SYBASE/sds/text/collections/pubs2.dbo.i_blurbs/style

To edit the style.vgw and style.ufl files, follow these steps:

  1. Drop the text index that contains the columns for which you are adding definitions.

    For example, to add definitions for the copy column in the blurbs table, use the following command to drop the text index:

    sp_drop_text_index i_blurbs
  2. Edit the style.vgw file. Following this line:

    dda    "SybaseTextServer"
    add an entry for the column you are defining. The syntax is:

    table: DOCUMENTS
    {
    copy: fcolumn_number copy_column_number
    }
    where column_number is the number of the column you are defining. Column numbers start with 0; if you want the first column to be sorted, specify "f0"; to sort the second column, specify "f1"; to sort the third column, specify "f2", and so on.

    For example, to define the first column in a table, the syntax is:

    table: DOCUMENTS
    {
    copy: f0 copy_f0
    }
    Then, your style.vgw file will be similar to this:

    #
    # Sybase Text Server Gateway
    #
    $control: 1
    gateway:
    {
    dda: "SybaseTextServer"
    {
    copy: f0 copy_f0
    }
    }
  3. Edit the style.ufl file, by adding the column definition for a data table named fts. The syntax is:

    data-table:            fts
    {
          fixwidth:         copy_fcolumn_number                     precision datatype
    }
    Column numbers start with 0; if you want the first column to be sorted, specify "f0"; to sort the second column, specify "f1"; to sort the third column, specify "f2", and so on. For example, to add a definition for the first column of a table, with a precision of 4, and a datatype of date, enter:

    data-table:            fts
    {
    fixwidth: copy_f0 4 date
    }
    Similarly, to add a definition for the second column of a table with a precision of 10, and a datatype of character, enter:

    data-table:            fts
    {
    fixwidth: copy_f1 10 text
    }
  4. Re-create the index, using sp_create_text_index.

Using Filters on Text That Contains Tags

To perform accurate searches on documents that contain tags (such as HTML or postscript), the text index must use a filter to strip out the tags. The Standard Full-Text Search engine provides filtering for SGML and HTML documents. The Enhanced Full-Text Search engine provides filters for a variety of document types (Microsoft Word, FrameMaker, WordPerfect, SGML, HTML, and so on).

When you create the text index to use a filter, the data for each type of tag in the document is placed into its own document zone. For example, if you have a tag called "chapter," all chapter names are placed into one document zone. You can issue a query that searches the entire document, or that searches only for data in the "chapter" zone (for more information, see "in").

To create a text index that uses a filter, modify the style.dft file for that text index. To edit the style.dft file, follow these steps:

  1. Create the text index using sp_create_text_index. Use the word "empty" in the option_string parameter so that the style.dft file is created for the text index, but the Verity collections are not populated with data. For example, to create a text index for the copy column of the blurbs table, use the following syntax:

    sp_create_text_index "KRAZYKAT", "i_blurbs", "blurbs", "empty", "copy"
    WARNING! You should specify only one column in the text index when the text index uses a filter.
  2. Drop the text index that you create in step 1. This drops the text index, but not the style.dft file. For example, use the following command to drop the i_blurbs text index:

    sp_drop_text_index i_blurbs
  3. Edit the style.dft file. The style.dft file is in the directory:

    $SYBASE/sds/text/collections/db.owner.index/style

    where db.owner.index is the database, the database owner, and the index created using sp_create_text_index. For example, if you created a text index called i_blurbs on the pubs2 database, the full path to the style.dft file would be similar to:

    $SYBASE/sds/text/collections/pubs2.dbo.i_blurbs/style

    Following this line:

    field:        f0
    add syntax to use a filter.

    With Standard Full-Text Search engine, use the following syntax:

    With Enhanced Full-Text Search engine, use the following syntax for all document types:

    /filter="universal"
    For example, your style.dft file for an SGML document in the Standard version will look like this:

    $control: 1
    dft:
    {
    field: f0
    /filter="zone -nocharmap"
    field: f1
    field: f2
    .
    .
    field: f15
    {
    Your style.dft file for an SGML document in the Enhanced version will look like this:

    $control: 1
    dft:
    {
    field: f0
    /filter="universal"
    field: f1
    field: f2
    .
    .
    field: f15
    {
  4. Re-create the index, using sp_create_text_index. For example:

sp_create_text_index "KRAZYKAT", "i_blurbs", "blurbs", "", "copy"

Creating a Custom Thesaurus (Enhanced Version Only)

The Verity thesaurus operator expands a search to include the specified word and its synonyms (for information on using the thesaurus operator, see "thesaurus"). In the Enhanced version of the Full-Text Search engine, you can create a custom thesaurus that contains application-specific synonyms to use in place of the default thesaurus.

For example, the default English language thesaurus contains these words as synonyms for "money": "cash," "currency," "lucre," "wampum," and "greenbacks." You can create a custom thesaurus that contains a different set of synonyms for "money"; for example, such as: "bid," "tokens," "credit," "asset," and "verbal offer."

To create a custom thesaurus, follow these steps:

  1. Make a list of the synonyms that you will use with your application. It may help to examine the default thesaurus (see "Examining the Default Thesaurus (Optional)").

  2. Create a control file that contains the synonyms you are defining for your custom thesaurus (see "Creating the Control File").

  3. Create the custom thesaurus using the mksyd utility (see "Creating the Thesaurus"). This uses the control file as input.

  4. Replace the default thesaurus with your custom thesaurus (see "Replacing the Default Thesaurus with the Custom Thesaurus").

For more information on "Custom Thesaurus Support" and the mksyd utility, see the Verity Web site at:

http://www.verity.com

In the Enhanced version of Full-Text Search engine, two sample files illustrate how to set up and use a custom thesaurus:

These files are in the $SYBASE/sds/text/sample/scripts directory.

Examining the Default Thesaurus (Optional)

A control file contains all the synonym definitions for a thesaurus. To examine the default thesaurus, create its control file using the mksyd utility. Use the syntax:

mksyd -dump -syd $SYBASE/sds/text/verity/common/vdkLanguage/vdk20.syd -f work_location/control_file.ctl

where:

Examine the control file (control_file.ctl) that it creates to view the default synonym lists.

Creating the Control File

Create a control file that contains the new synonyms for your custom thesaurus. The control file is an ASCII text file in a structured format. Using a text editor (such as vi or emacs), either:

Control File Syntax

The control file contains synonym list definitions in a synonyms: statement. For example, the following is a control file named colors.ctl:

$control: 1
synonyms:
{
list: "red, ruby, scarlet, fuchsia,\
magenta"
list: "electric blue <or> azure"
/keys = "lapis"
}
$$

The synonyms: statement includes:

For more complex examples of control files, see the Verity Web site.

Creating the Thesaurus

The mksyd utility creates the custom thesaurus using a control file as input. It is located in:

$SYBASE/sds/text/verity/bin

Run, or define an alias to run, mksyd from this bin directory. Create your custom thesaurus in any work directory.

The mksyd syntax for creating a custom thesaurus is:

 mksyd -f control_file.ctl -syd custom_thesaurus.syd

where:

For example, to execute the mksyd utility reading the sample control file defined above, and directing output to a work directory, use the syntax:

mksyd -f /usr/u/sybase/dba/thesaurus/colors.ctl -syd /usr/u/sybase/dba/thesaurus/custom.syd 

Replacing the Default Thesaurus with the Custom Thesaurus

The default thesaurus named vdk20.syd is located in:

$SYBASE/sds/text/verity/common/vdkLanguage

where vdkLanguage is the value of the vdkLanguage configuration parameter (for example, the English directory is $SYBASE/sds/text/verity/common/english0). Each application and user reading from this location at runtime uses this thesaurus. To replace it with your custom thesaurus, follow these steps:

  1. Back up the default thesaurus before replacing it with the custom thesaurus. For example:

    mv /sybase/sds/text/verity/common/english0/vdk20.syd default.syd
  2. Replace the vdk20.syd file with your custom thesaurus. For example:

    cp custom.syd /sybase/sds/text/verity/common/english0/vdk20.syd
  3. Restart your Full-Text Search engine; no configuration file changes are required. The thesaurus is read from this location when the Full-Text Search engine is started, not when a query is executed.

Queries using the thesaurus operator will now use the custom thesaurus.

Creating Topics (Enhanced Version Only)

A topic is a grouping of information related to a concept or subject area. With topic definitions in place, a user can perform searches on the topic instead of having to write queries with complex syntax.

The user creates topics which can be combinations of words and phrases, Verity operators and modifiers, and weight values. Then, any user can query the topic.

Before you create topics, determine your application requirements, and establish standards for naming conventions and for the location of the following:

To implement topics, perform the following steps:

  1. Create one or more outline input files to define your topics (see "Creating an Outline File")

  2. Create and populate a topic set directory, using the mktopics utility (see "Creating a Topic Set Directory")

  3. Create a knowledge base map, specifying the locations of one or more topic set directories (see "Creating a Knowledge Base Map")

  4. Set the knowledge_base configuration parameter to point to the location of the knowledge base map (see "Defining the Location of the Knowledge Base Map")

  5. Execute queries against defined topics.

For more information about outline formats, operator precedence rules, and the mktopics utility, see the Verity Web site:

http://www.verity.com.

See also the Verity document Search '97 Introduction to Topics.

The following sample files illustrate the topics feature:

These files are in the $SYBASE/sds/text/sample/scripts directory.

Creating an Outline File

A topic outline file specifies all the combinations of words and phrases, Verity operators and modifiers, and weight values that the search engine uses when you issue a query using the topic. The outline file is an ASCII text file in a structured format.

For example, the following outline file defines the topic "saint-bernard":

$control: 1 
saint-bernard <accrue>
*0.80 "Saint Bernard"
*0.80 "St. Bernard"
* "working dogs"
* "large dogs"
* "European breeds"
$$

When you issue a query specifying the topic "saint-bernard", the Full-Text Search engine:

This example is a very basic topic definition. An outline can introduce more complex relationships by using:

For complex examples of outline files, see the Verity Web site.

Note: In Windows NT, you can use the graphical user interface of the Verity topicEDITOR product to create topic outlines. It is available from Verity. If you use topicEDITOR, it automatically creates a topic set directory, and you can go to "Creating a Knowledge Base Map" to continue setting up your topics.

Creating a Topic Set Directory

Use the mktopics utility to create and populate a topic set directory. It is located in:

$SYBASE/sds/text/verity/bin

Run, or define an alias to run, mktopics from this bin directory. You can create a topic set directory or directories in any work directory.

The mktopics syntax is:

mktopics -outline outline_file.otl -topicset topic_set_directory

where:

For example, to execute the mktopics utility reading the saint-bernard.otl file defined above, and directing output to a work directory, use the syntax:

mktopics -outline /usr/u/sybase/topic_outlines/saint-bernard.otl -topicset /usr/u/sybase/topic_sets/saint-bernard_topic

Creating a Knowledge Base Map

A knowledge base map specifies the locations of one or more topic set directories. Create an ASCII knowledge base map file that defines the fully-qualified directory paths to your topic sets.

For example, the following knowledge base map file illustrates how you can list multiple knowledge bases in the map. The first entry identifies the topic set directory created with mktopics above.

$control: 
1 kbases:
{
kb:
/kb-path = /usr/u/sybase/topic_sets/saint-bernard_topic
kb:
/kb-path = /usr/u/sybase/topic_sets/another_topic
}

Defining the Location of the Knowledge Base Map

Set the knowledge_base configuration parameter to point to the location of the knowledge base map. For example:

sp_text_configure KRAZYKAT, 'knowledge_base', '/usr/u/sybase/topic_sets/sample_text_topics.kbm'

The knowledge_base configuration parameter is static, and you must restart the Full-Text Search engine for the definition to take effect.

Executing Queries Against Defined Topics

You can now execute queries using the defined topic instead of a complex query. For example, before you create the "saint-bernard" topic, you would have to use the following syntax:

...where i.index_any = "<accrue> ([80]Saint Bernard, [80]St. Bernard, working dogs, large dogs, European breeds)"

to find documents that:

After you create the topic "saint-bernard", you can use this syntax:

...where i.index_any = "<topic>saint-bernard"

or:

...where i.index_any = "saint bernard"
Note: If you enter a word in a query expression, the Full-Text Search engine tries to match it with a topic name. If you enter a phrase in a query expression, the Full-Text Search engine replaces spaces with hyphens (-), and then tries to match it with a topic name. For example, the Full-Text Search engine matches "saint bernard" with the topic "saint-bernard".

See the sample_text_topics.sql file for examples of using topics in queries.

Troubleshooting Topics

If the knowledge_base configuration parameter specifies a knowledge base map file that does not exist, the Full-Text Search engine will not be able to start a session with Verity, and the server will not start. If the map file exists but contains invalid entries, Verity issues warning messages at start-up time.


Chapter 3: Configuring Adaptive Server for Full-Text Searches [Table of Contents] Chapter 5: Writing Full-Text Search Queries