Sybase Technical Library - Product Manuals Home
[Search Forms] [Previous Section with Hits] [Next Section with Hits] [Clear Search] Expand Search

Chapter 3: Data Storage [Table of Contents] Chapter 5: Understanding the Query Optimizer

Performance and Tuning Guide

[-] Chapter 4: How Indexes Work

Chapter 4

How Indexes Work

This chapter describes how Adaptive Server stores indexes and how it uses indexes to speed data retrieval for select, update, delete, and insert operations. This chapter contains the following sections:

From Heaps of Pages to Fast Performance

Indexes are the most important physical design element in improving database performance:

In addition to their performance benefits, indexes can enforce the uniqueness of data.

Indexing requires trade-offs. Although indexes speed data retrieval, they can slow down data modifications, since most changes to the data also require updating the indexes. Optimal indexing demands:

What Are Indexes?

Indexes are database objects that can be created for a table to speed direct access to specific data rows. Indexes store the values of the key(s) that were named when the index was created and logical pointers to the data pages or to other index pages (see Figure 4-1).

Figure 4-1: A simplified index schematic
raster

Types of Indexes

Adaptive Server provides two types of indexes:

You can only create one clustered index on a table because there is only one possible physical ordering of the data rows. You can create up to 249 nonclustered indexes per table.

A table that has no clustered index is called a heap. The rows in the table are in no particular order, and all new rows are added to the end of the table. Chapter 3, "Data Storage," discusses heaps and SQL operations on heaps.

Index Pages

Index entries are stored as rows on index pages in a format similar to the format used for data rows on data pages. Index entries store the key values and pointers to lower levels of the index, to the data pages, or to individual data rows. Adaptive Server uses B-tree indexing, so each node in the index structure can have multiple children.

Index entries are usually much smaller than a data row in a data page, and index pages are much more densely populated than data pages. If a data row has 200 bytes (including row overhead), there are 10 rows per page. An index on a 15-byte field has about 100 rows per index page (the pointers require 4-9 bytes per row, depending on the type of index and the index level).

Indexes can have multiple levels:

Root Level

The root level is the highest level of the index. There is only one root page. If an allpages-locked table is very small, so that the entire index fits on a single page, there are no intermediate or leaf levels, and the root page stores pointers to the data pages. Data-only-locked tables always have a leaf level between the root page and the data pages.

For larger tables, the root page stores pointers to the intermediate level index pages or to leaf-level pages.

.

Figure 4-2: Data-only-locked tables always contain root and leaf levels
raster

Leaf Level

The lowest level of the index is the leaf level. At the leaf level, the index contains a key value for each row in the table, and the rows are stored in sorted order by the index key:

Intermediate Level

All levels between the root and leaf levels are intermediate levels. An index on a large table or an index using long keys may have many intermediate levels. A very small allpages-locked table may not have an intermediate level at all; the root pages point directly to the leaf level.

Clustered Indexes on Allpages-Locked Tables

In clustered indexes on allpages-locked tables, leaf-level pages are also the data pages, and all rows are kept in physical order by the keys. Clustered indexes on allpages-locked tables are structurally different than nonclustered indexes.

Note: , For data-only-locked tables, clustered indexes are structured like nonclustered indexes. For information on clustered indexes on data-only-locked tables, see "Clustered Indexes on Data-Only-Locked Tables".

In an allpages-locked table with a clustered index, data rows are physically ordered by the index key. Physical ordering means that:

On the root and intermediate pages, each entry points to a page on the next level. Figure 4-3 shows a clustered index.

Figure 4-3: Clustered index on last name, on an allpages-locked table
raster

Clustered Indexes and Select Operations

To select a particular last name using a clustered index, Adaptive Server first uses sysindexes to find the root page. It examines the values on the root page and then follows page pointers, performing a binary search on each page it accesses as it traverses the index. In Figure 4-4, there is a clustered index on the "last name" column.

Figure 4-4: Selecting a row using a clustered index, allpages-locked table
raster

On the root level page, "Green" is greater than "Bennet," but less than Karsen, so the pointer for "Bennet" is followed to page 1007. On page 1007, "Green" is greater than "Greane," but less than "Hunter," so the pointer to page 1133 is followed to the data page, where the row is located and returned to the user.

This retrieval via the clustered index requires:

These reads may come either from cache (called a logical read) or from disk (called a physical read). On tables that are frequently used, the higher levels of the indexes are often found in cache, with lower levels and data pages being read from disk.

Clustered Indexes and Insert Operations

When you insert a row into an allpages-locked table with a clustered index, the data row must be placed in physical order according to the key value on the table. Other rows on the data page move down on the page, as needed, to make room for the new value. As long as there is room for the new row on the page, the insert does not affect any other pages in the database. The clustered index is used to find the location for the new row.

Figure 4-5 shows a simple case where there is room on an existing data page for the new row. In this case, the key values in the index do not need to change.

Figure 4-5: Inserting a row into an allpages-locked table with a clustered index
raster

Page Splitting on Full Data Pages

If there is not enough room on the data page for the new row, a page split must be performed:

In some cases, page splitting is handled slightly differently. See "Exceptions to Page Splitting".

In Figure 4-6, the page split requires adding a new row to an existing index page, page 1007.

Figure 4-6: Page splitting in an allpages-locked table with a clustered index
raster

Exceptions to Page Splitting

There are exceptions to 50-50 page splits:

Page Splitting on Index Pages

If a new row needs to be added to a full index page, the page split process on the index page is similar to the data page split. A new page is allocated, and half of the index rows are moved to the new page. A new row is inserted at the next highest level of the index to point to the new index page.

Performance Impacts of Page Splitting

Page splits are expensive operations. In addition to the actual work of moving rows, allocating pages, and logging the operations, the cost is increased by:

When you create a clustered index for a table that will grow over time, you may want to use fillfactor to leave room on data pages and index pages. This reduces the number of page splits for a time. See "Choosing Space Management Properties for Indexes".

Overflow Pages

Special overflow pages are created for nonunique clustered indexes on allpages-locked tables when a newly inserted row has the same key as the last row on a full data page. A new data page is allocated and linked into the page chain, and the newly inserted row is placed on the new page (see Figure 4-7).

Figure 4-7: Adding an overflow page to a clustered index, allpages-locked table
raster

The only rows that will be placed on this overflow page are additional rows with the same key value. In a nonunique clustered index with many duplicate key values, there can be numerous overflow pages for the same value.

The clustered index does not contain pointers directly to overflow pages. Instead, the next page pointers are used to follow the chain of overflow pages until a value is found that does not match the search value.

Clustered Indexes and Delete Operations

When you delete a row from an allpages-locked table that has a clustered index, other rows on the page are moved up to fill the empty space so that the data remains contiguous on the page. Figure 4-8 shows a page that has four rows before a delete operation removes the second row on the page. The two rows that follow the deleted row are moved up.

Figure 4-8: Deleting a row from a table with a clustered index
raster

Deleting the Last Row on a Page

If you delete the last row on a data page, the page is deallocated and the next and previous page pointers on the adjacent pages are changed. The rows that point to that page in the leaf and intermediate levels of the index are removed.

If the deallocated data page is on the same extent as other pages belonging to the table, it can be used again when that table needs an additional page. If the deallocated data page is the last page on the extent that belongs to the table, the extent is also deallocated and becomes available for the expansion of other objects in the database.

Figure 4-9: Deleting the last row on a page (before the delete)
raster

In Figure 4-10, which shows the table after the delete, the pointer to the deleted page has been removed from index page 1007 and the following index rows on the page have been moved up to keep the used space contiguous.

Figure 4-10: Deleting the last row on a page (after the delete)
raster

Index Page Merges

If you delete a pointer from an index page, leaving only one row on that page, the row is moved onto an adjacent page, and the empty page is deallocated. The pointers on the parent page are updated to reflect the changes.

Nonclustered Indexes

The B-tree works much the same for nonclustered indexes as it does for clustered indexes, but there are some differences. In nonclustered indexes:

With keys of the same size, nonclustered indexes require more space than clustered indexes.

Leaf Pages Revisited

To clearly understand the difference between clustered indexes on allpages-locked tables and other indexes, it is important to recall the definition of the leaf page of an index: it is the lowest level of the index where all of the keys for the index appear in sorted order.

In clustered indexes on allpages-locked tables, the data rows are stored in order by the index keys, so by definition, the data level is the leaf level. There is no other level of the clustered index that contains one index row for each data row. Clustered indexes on allpages-locked tables are sparse indexes. The level above the data contains one pointer for every data page, not data row.

In nonclustered indexes and clustered indexes on data-only-locked tables, the level just above the data is the leaf level: it contains a key-pointer pair for each data row. These indexes are dense. At the level above the data, they contain one index row for each data row.

Nonclustered Index Structure

The table in Figure 4-11 shows a nonclustered index on lname. The data rows at the far right show pages in ascending order by employee_id (10, 11, 12, and so on) because there is a clustered index on that column.

The root and intermediate pages store:

The leaf level stores:

The row ID in higher levels of the index is used for indexes that allow duplicate keys. If a data modification changes the index key or deletes a row, the row ID positively identifies all occurrences of the key at all index levels.

Figure 4-11: Nonclustered index structure
raster

Nonclustered Indexes and Select Operations

When you select a row using a nonclustered index, the search starts at the root level. sysindexes.root stores the page number for the root page of the nonclustered index.

In Figure 4-12, "Green" is greater than "Bennet," but less than "Karsen," so the pointer to page 1007 is followed. "Green" is greater than "Greane," but less than "Hunter," so the pointer to page 1133 is followed. Page 1133 is the leaf page, showing that the row for "Green" is row 2 on page 1421. This page is fetched, the "2" byte in the offset table is checked, and the row is returned from the byte position on the data page.

Figure 4-12: Selecting rows using a nonclustered index
raster

Nonclustered Index Performance

The query in Figure 4-12 requires the following I/O:

If your applications use a particular nonclustered index frequently, the root and intermediate pages will probably be in cache, so only one or two physical disk I/Os need to be performed.

Nonclustered Indexes and Insert Operations

When you insert rows into a heap that has a nonclustered index and no clustered index, the insert goes to the last page of the table. If the heap is partitioned, the insert goes to the last page on one of the partitions. Then, the nonclustered index is updated to include the new row. If the table has a clustered index, it is used to find the location for the row. The clustered index is updated, if necessary, and each nonclustered index is updated to include the new row.

Figure 4-13 shows an insert into a heap table with a nonclustered index. The row is placed at the end of the table. A row containing the new key value and the row ID is also inserted into the leaf level of the nonclustered index.

Figure 4-13: An insert into a heap table with a nonclustered index
raster

Nonclustered Indexes and Delete Operations

When you delete a row from a table, the query can use a nonclustered index on the column or columns in the where clause to locate the data row to delete, as shown in Figure 4-14. The row in the leaf level of the nonclustered index that points to the data row is also removed. If there are other nonclustered indexes on the table, the rows on the leaf level of those indexes are also deleted.

Figure 4-14: Deleting a row from a table with a nonclustered index
raster

If the delete operation removes the last row on the data page, the page is deallocated and the adjacent page pointers are adjusted in allpages-locked tables. Any references to the page are also deleted in higher levels of the index.

If the delete operation leaves only a single row on an index intermediate page, index pages may be merged, as with clustered indexes. See "Index Page Merges".

There is no automatic page merging on data pages, so if your applications make many random deletes, you may end up with data pages that have only a single row, or a few rows, on a page.

Clustered Indexes on Data-Only-Locked Tables

Clustered indexes on data-only-locked tables are structured like nonclustered indexes. They have a leaf level above the data pages. The leaf level contains the key values and row ID for each row in the table.

Unlike clustered indexes on allpages-locked tables, the data rows in a data-only-locked table are not necessarily maintained in exact order by the key. Instead, the index directs the placement of rows to pages that have adjacent or nearby keys. When a row needs to be inserted in a data-only-locked table with a clustered index, the insert uses the clustered index key just before the value to be inserted. The index pointers are used to find that page, and the row is inserted on the page if there is room. If there is not room, the row is inserted on a page in the same allocation unit, or on another allocation unit already used by the table.

To provide nearby space for maintaining data clustering during inserts and updates to data-only-locked tables, you can set space management properties to provide space on pages (using fillfactor and exp_row_size) or on allocation units (using reservepagegap). See Chapter 31, "Setting Space Management Properties."

Figure 4-15 shows a clustered index being used to direct an insert to a data page in a data-only-locked table.

Figure 4-15: Insert using a clustered index on a data-only-locked table
raster

Index Covering

Index covering is a indexing strategy that can produce dramatic performance improvements when all columns needed by the query are included in the index.

You can create indexes on more than one key. These are called composite indexes. Composite indexes can have up to 31 columns adding up to a maximum 600 bytes. If you create a composite nonclustered index on each column referenced in the query's select list and in any where, having, group by, and order by clauses, the query can be satisfied by accessing only the index. Since the leaf level of a nonclustered index or a clustered index on a data-only-locked table contains the key values for each row in a table, queries that access only the key values can retrieve the information by using the leaf level of the nonclustered index as if it were the actual table data. This is called index covering.

There are two types of index scans that can use an index that covers the query:

For both types of covered queries, the index keys must contain all the columns named in the query. Matching scans have additional requirements. "Choosing Composite Indexes" describes query types that make good use of covering indexes.

Covering Matching Index Scans

This type of index covering lets you skip the last read for each row returned by the query, the read that fetches the data page. For point queries that return only a single row, the performance gain is slight¾just one page. For range queries, the performance gain is larger, since the covering index saves one read for each row returned by the query.

For a covering matching index scan to be used, the index must contain all columns named in the query. Plus, the columns in the where clauses of the query must include the leading column of the columns in the index. For example, for an index on columns A, B, C, D, the following sets can perform matching scans: A, AB, ABC, AC, ACD, ABD, AD, and ABCD. The columns B, BC, BCD, BD, C, CD, or D do not include the leading column and can be used only for nonmatching scans.

When doing a matching index scan, Adaptive Server uses standard index access methods to move from the root of the index to the nonclustered leaf page that contains the first row.

In Figure 4-16, the nonclustered index on lname, fname covers the query. The where clause includes the leading column, and all columns in the select list are included in the index, so the data page does not need to be accessed.

Figure 4-16: Matching index access does not have to read the data row
raster

Covering Nonmatching Index Scans

When the columns specified in the where clause do not include the leading column in the index, but all columns named in the select list and other query clauses (such as group by or having) are included in the index, Adaptive Server saves I/O by scanning the entire leaf level of the index, rather than scanning the table. It cannot perform a matching scan because the first column of the index is not specified.

The query in Figure 4-17 shows a nonmatching index scan. This query does not use the leading columns on the index, but all columns required in the query are in the nonclustered index on lname, fname, emp_id. The nonmatching scan must examine all rows on the leaf level. It scans all leaf level index pages, starting from the first page. It has no way of knowing how many rows might match the query conditions, so it must examine every row in the index. Since it must begin at the first page of the leaf level, it can use the pointer in sysindexes.first rather than descending the index.

Figure 4-17: A nonmatching index scan
raster

Indexes and Caching

"How Adaptive Server Performs I/O for Heap Operations" introduces the basic concepts of the Adaptive Server data cache, and shows how caches are used when reading heap tables. Index pages get special handling in the data cache, as follows:

When a query that uses an index is executed, the root, intermediate, leaf, and data pages are read in that order. If these pages are not in cache, they are read into the MRU end of the cache and are moved toward the LRU end as additional pages are read in. Figure 4-18 shows the data cache just after these 4 pages have been read.

Figure 4-18: Caching used for a point query via a nonclustered index
raster

Each time a page is found in cache, it is moved to the MRU end of the page chain, so the root page and higher levels of the index tend to stay in the cache. Figure 4-19 shows a root page being moved back to the top of the cache for a second query that uses the same index, accessing different intermediate, leaf, and data pages.

Figure 4-19: Finding the root index page in cache
raster

Using Separate Caches for Data and Index Pages

Indexes and the tables they index can use different caches. A System Administrator or table owner can bind a clustered or nonclustered index to one cache and its table to another. Figure 4-20 shows index pages being read into one cache and a data page into a different cache .

Figure 4-20: Caching with separate caches for data and index pages
raster

Index Trips Through the Cache

A special strategy keeps index pages in cache. Data pages make only a single trip through the cache: they are read in at the MRU end of the cache or placed just before the wash marker, depending on the cache strategy chosen for the query. Once the pages reach the LRU end of the cache, the buffer for that page is reused when another page needs to be read into cache.

For index pages, a counter controls the number of trips that an index page can make through the cache. When the counter is greater than 0 for an index page, and it reaches the LRU end of the page chain, the counter is decremented by 1, and the page is placed at the MRU end again. Figure 4-21 shows an index page moving from the LRU end of the cache to the MRU end.

Figure 4-21: Index page recycling in the cache
raster

By default, the number of trips that an index page makes through the cache is set to 0. To change the default, a System Administrator can set the configuration parameter number of index trips. For more information, see "number of index trips" of the System Administration Guide.


The Transaction Log: A Special Heap Table [Table of Contents] Chapter 5: Understanding the Query Optimizer