Sybase Technical Library - Product Manuals Home
[Search Forms] [Previous Section with Hits] [Next Section with Hits] [Clear Search] Expand Search

Chapter 1: Introduction [Table of Contents] Chapter 3: Configuring Adaptive Server for Full-Text Searches

Full-Text Search Specialty Data Store User's Guide

[-] Chapter 2: Understanding the Full-Text Search Engine

Chapter 2

Understanding the Full-Text Search Engine

This chapter describes how a Full-Text Search engine works. Topics include:

Components of the Full-Text Search Engine

The Full-Text Search engine uses the following components to provide full-text search capabilities:

The Source Table

The source table is a user table maintained by Adaptive Server. It contains one or more columns using the text, image, char, varchar, datetime, or small datetime datatype, which holds the data to be searched. With the Enhanced Full-Text Search engine, the source table can also have int, smallint, and tinyint columns, which holds the data to be searched. The source table must have an IDENTITY column, which is used to join with the id column of an index table during text searches.

The source table can be a local table, which holds the actual data, or it can be a proxy table that is mapped to remote data.

The Verity Collections

The Full-Text Search engine uses the Verity collections, which are located in $SYBASE/sds/text/collections. When you create the text indexes, as described in "Creating the Text Index and Index Table", Verity creates a collection, which is a directory that implements a text index. This collection is queried by the Full-Text Search engine. For more information about Verity collections, see the Verity Web site:

http://www.verity.com

Filters

The text index uses a filter to strip out the tags in a document that is not ASCII text. The Standard Full-Text Search engine provides filtering for SGML and HTML documents. The Enhanced Full-Text Search engine provides filters for a variety of document types (Microsoft Word, FrameMaker, WordPerfect, SGML, HTML, and so on).

The text_db Database

During the installation of the Full-Text Search engine, a database named text_db is added to Adaptive Server using the installation script installtextserver, as described in "Running the installtextserver Script". The database does not contain any user data, but contains two support tables: vesaux and vesauxcol. These tables contain the metadata used by the Full-Text Search engine to maintain integrity between the Adaptive Server source tables and the Verity collections.

When updating the collections after an insert, update, or delete is made to an indexed column, the Full-Text Search engine queries the vesaux and vesauxcol tables. These tables determine which collections contain the modified columns so that all affected collections are updated. The Full-Text Search engine also uses these tables when it is brought online, to make sure that all necessary collections exist.

The vesaux Table

The columns in the vesaux table are described in Table 2-1.

Table 2-1: Columns in the vesaux table

Column Name

Description

id

IDENTITY column

object_name

Name of the source table on which the external index is being created

option_string

Text index creation options

collection_id

Name of the Verity collection

key_column

Name of the IDENTITY column in the source table

svrid

Server ID of the Full-Text Search engine maintaining the collection

The vesauxcol Table

The columns in the vesauxcol table are described in Table 2-2.

Table 2-2: Columns in the vesauxcol table

Column Name

Description

id

ID of the referenced row in the vesaux table

col_name

Name of the column for which you are searching

col_type

Column type (text, image, char, varchar, datetime, smalldatetime; with the Enhanced Full-Text Search engine, also int, smallint, and tinyint)

The Index Table

The index table provides a means of locating and searching documents stored in the source table. The index table is maintained by the Full-Text Search engine and has an id column that maps to the IDENTITY column of the corresponding source table. The IDENTITY value from the row in the source table is stored with the data in the Verity collections, which allows the source and index tables to be joined. Although the index table is stored and maintained by the Full-Text Search engine, it functions as a local table to Adaptive Server through the Component Integration Services feature.

The index table contains special columns, called pseudo columns, that are used by the Full-Text Search engine to determine the parameters of the search and the location of the text data in the source table. Pseudo columns have no associated physical storage¾the values of a pseudo column are valid only for the duration of the query and are removed immediately after the query finishes running.

For example, when you use the score pseudo column in a query, to rank each document according to how well the document matches the query, you may have to use a score of 15 to find references to the phrase "small Saint Bernards" in the text database. This phrase does not occur very often, and a low score value broadens the search to include documents that have a small number of occurrences of the search criteria. However, if you are searching for a phrase that is common, like "large Saint Bernards," you could use a score of 90, which would limit the search to those documents that have many occurrences of the search criteria.

You use the score column and the other pseudo columns, id, index_any, sort_by, summary, and max_docs, to specify the parameters to include in your search. For a description of the pseudo columns, see "Pseudo Columns in the Index Table".

The text_events Table

Each database containing tables referenced by a text index must contain an events table, which logs inserts, updates, and deletes to indexed columns. The name of this table is text_events. It is used to propagate updated data to the Verity collections.

The columns in the text_events table are described in Table 2-3.

Table 2-3: Columns in the text_events table

Column Name

Description

event_id

IDENTITY column.

id

ID of the row that was updated, inserted, or deleted.

tableid

Name of the table that contains the row that was updated, inserted, or deleted.

columnid

Name of the column that the text index was created on.

event_date

Date and time of the update, insert, or delete.

event_type

Type of update (update, insert, or delete).

event_status

Indicates whether the update, insert, or delete has been propagated to the collections.

srvid

Server ID of the Full-Text Search engine maintaining the collection.

Relationships Between the Components

The relationships between the Full-Text Search engine components are shown in Figure 2-1.

Figure 2-1: Components of the Full-Text Search engine
raster

How a Full-Text Search Works

To perform a full-text search, you enter a select statement that joins the IDENTITY column from the source table with the id column of the index table, using pseudo columns as needed to define the search. For example, the following query searches for documents in the blurbs table of the pubs2 database in which the word "Greek" appears near the word "Gustibus" (the i_blurbs table is the index table):

select t1.score, t2.copy
from i_blurbs t1, blurbs t2
where t1.id=t2.id and t1.score > 20
and t1.max_docs = 10
and t1.index_any = "<near>(Greek, Gustibus)"

Adaptive Server and the Full-Text Search engine split the query processing, as follows:

  1. The Full-Text Search engine processes the query:

    select t1.score, t1.id
    from i_blurbs t1
    where t1.score > 20
    and t1.max_docs = 10
    and t1.index_any = "<near>(Greek, Gustibus)"
    The select statement includes the Verity operator near and the pseudo columns score, max_docs, and index_any. The operator and pseudo columns provide the parameters for the search on the Verity collections¾they narrow the result set from the entire copy column to the 10 documents in which the words "Greek" and "Gustibus" appear closest to each other.

  2. Adaptive Server processes the following select statement on the result set that is returned by the Full-Text Search engine in step 1:

    select t1.score, t2.copy
    from i_blurbs t1, blurbs t2
    where t1.id=t2.id
    This joins the blurbs and i_blurbs tables (the source table and the index table, respectively) on the IDENTITY column of the blurbs table and the id column of the i_blurbs table.

Figure 2-2 describes how Adaptive Server and the Full-Text Search engine process the query.

Figure 2-2: Processing a full-text search query
raster


Chapter 1: Introduction [Table of Contents] Chapter 3: Configuring Adaptive Server for Full-Text Searches