![]() | ![]() |
Home |
|
|
Utility Guide |
|
| Chapter 3 Using bcp to Transfer Data to and from Adaptive Server |
|
| bcp performance |
Keeping indexes and triggers on a table causes the bulk copy utility to use slow bcp automatically. However, slow bcp can fill the transaction log very quickly.
When you are copying a large number of rows, the performance penalty and log space requirements for using slow bcp can be severe.
For extremely large tables, using slow bcp is not an option because its detailed log makes it much too slow.
To improve the performance of bcp:
Use partitioned tables. Several bcp sessions with a partitioned table can reduce dramatically the time required to copy the data. However, such performance improvements are more noticeable in fast bcp than in slow bcp.
Use bcp in parallel to increase performance dramatically. Parallel bulk copy can provide balanced data distribution across partitions. For more information, see "Using parallel bulk copy to copy data into a specific partition".
The existence of indexes and triggers on tables affects transfer speed. When you use bcp on such tables, bcp automatically uses its slow mode, which logs data inserts in the transaction log. These logged inserts can cause the transaction log to become very large.
To control this data excess and ensure that the database is fully recoverable in the event of a failure, you can back up the log with dump transaction.
bcp does not fire any trigger that exists on the target table.
Fast bcp logs only the page allocations. For copying data in, bcp is fastest if your database table has no indexes or triggers.
However, if you used fast bcp to make data inserts, which fast bcp does not log, you cannot back up (dump) the transaction log to a device. The changes are not in the log, and a restore cannot recover nonexistent backup data. The requested backup (dump transaction) produces an error message that instructs you to use dump database instead. This restriction remains in force until a dump database successfully completes.
For more information about dump database and dump transaction, see the Adaptive Server Enterprise System Administration Guide, and the Adaptive Server Enteprise Reference Manual.
The bcp program is optimized to load data into tables that do not have indexes or triggers associated with them. It loads data into tables without indexes or triggers at the fastest possible speed, with a minimum of logging. Page allocations are logged, but the insertion of rows is not.
When you copy data into a table that has one or more indexes or triggers, a slower version of bcp is automatically used, which logs row inserts. This includes indexes implicitly created using the unique integrity constraint of a create table statement. However, bcp does not enforce the other integrity constraints defined for a table.
By default, the select into/bulkcopy/pllsort option is false (off) in newly created databases. To change the default situation, turn this option on in the model database.
The log can grow very large during slow bcp because bcp logs inserts into a table that has indexes or triggers. After the bulk copy completes, back up your database with dump database, then truncate the log with dump transaction after the bulk copy completes and after you have backed up your database with dump database.
While the select into/bulkcopy/pllsort option is on, you cannot dump the transaction log. Issuing dump transaction produces an error message instructing you to use dump database instead.
Warning!
Be certain that you dump your database before you turn off the select into/bulkcopy/pllsort flag. If you have inserted unlogged data into your database, and you then perform a dump transaction before performing a dump database, you will not be able to recover your data.
Fast bcp runs more slowly while a dump database is taking place.
Table 3-1 shows which version bcp uses when copying in, the necessary settings for the select into/bulkcopy/pllsort option, and whether the transaction log is kept and can be dumped.
select into/bulkcopy/pllsort | on | off |
fast bcp (no indexes or triggers on target table) | OK dump transaction prohibited | bcp prohibited dump transaction |
slow bcp (one or more indexes or triggers) | OK dump transaction prohibited | OK dump transaction OK |
The performance penalty for copying data into a table that has indexes or triggers in place can be severe. If you are copying in a very large number of rows, it may be faster to drop all the indexes and triggers beforehand with drop index (or alter table, for indexes created as a unique constraint) and drop trigger; set the database option; copy the data into the table; re-create the indexes and triggers; and then dump the database. Remember to allocate disk space for the construction of indexes and triggers: about 2.2 times the amount of space needed for the data.
To allow a user to copy in data using fast bcp, either a System Administrator or the Database Owner first must use sp_dboption to set select into/bulkcopy/pllsort to true on the database that contains the target table or tables. If the option is set to false when a user tries to use fast bcp to copy data into a table without indexes or triggers, Adaptive Server generates an error message.
You do not need to set the select into/bulkcopy/pllsort option to true to copy out data from, or to copy in data to a table that has indexes or triggers. Slow bcp always copies tables with indexes or triggers and logs all inserts.
By default, the select into/bulkcopy/pllsort option is set to false (off) in newly created databases. To change the default setting for future databases, turn this option on (set to true) in the model database.
If you are copying a very large number of rows, you must have 1.2 times the amount of space needed for the data and enough space for the server to reconstruct a clustered index.
If space is available, you can use drop index and drop trigger to drop all the indexes and triggers beforehand.
If you do not have enough space for the server to sort the data and build the index or indexes, use slow bcp.
Table 3-2 summarizes the steps for copying in data to Adaptive Server using fast bcp.
Step | Who can do it |
Use sp_dboption to set select into/bulkcopy/pllsort to true. Run checkpoint in the database that was changed. | System Administrator or Database Owner |
Have enough space to re-create any indexes and triggers on the table. Drop the indexes and triggers on the table. | Table owner |
Have insert permission on the table. | Granted by the table owner |
Perform the copy with bcp. | Any user with insert permission |
Re-create the indexes and triggers. | Table owner |
Reset sp_dboption, if desired, and run checkpoint in the database that was changed. | System Administrator or Database Owner |
Use dump database to back up the newly inserted data. | System Administrator, Operator, or Database Owner |
Run stored procedures or queries to determine whether any of the newly loaded data violates rules. | Table owner or stored procedure owner |
In certain circumstances, you can improve bcp performance dramatically by executing several bcp sessions with a partitioned table.
Partitioned tables improve insert performance by reducing lock contention and by distributing I/O over multiple devices. bcp performance with partitioned tables is improved primarily because of this distributed I/O.
When you execute a bcp session on a partitioned table, consider:
A partitioned table improves performance only when you are bulk copying in to the table.
The performance of slow bcp does not improve as much with partitioned tables. Instead, drop all indexes and triggers and use fast bcp, as described in Table 3-2, to increase performance.
Network traffic can quickly become a bottleneck when multiple bcp sessions are being executed. If possible, use a local connection to the Adaptive Server to avoid this bottleneck.
To copy data into a partitioned heap table, you can either:
Copy the data randomly without regard to the partition to which data is copied, or
Copy the data into a specific partition
If the table has a clustered index, bcp runs in slow mode and allows the index to control the placement of rows.
To copy data randomly into partitioned tables when using multiple bcp sessions, you must:
Configure the table with as many partitions and physical devices as you require for your system.
For more information, see the Performance and Tuning Guide, and "Using parallel bulk copy to copy data into a specific partition" of this manual.
Make sure Adaptive Server is configured with enough locks to support multiple bcp sessions. For information on configuring locks, see the System Administration Guide.
Remove the triggers and indexes on the table and enable fast bcp. See "Using fast or slow bcp" for instructions.
If you use slow bcp, performance may not improve significantly after you remove the triggers and indexes. Also, if the table contains indexes, you may experience deadlocks on the index pages.
Divide the bcp input file into as many files of equal size as the number of planned simultaneous bcp sessions.
You also can use the -F first_row and -L last_row options to specify the start and end of each "input file."
Execute the bcp sessions with separate files in parallel on the local Adaptive Server machine.
For example, on UNIX platforms, you can execute different sessions in different shell windows or start individual bcp sessions in the background.
Read the Performance and Tuning Guide for a detailed description of copying data into partitioned tables.
Monitoring bcp sessions with dbcc checktable and sp_helpsegmentIf you do not specify which partition the bcp sessions should use, Adaptive Server randomly assigns the multiple bcp sessions to the table's available partitions. If this random assignment occurs, be sure to monitor the partitions to ensure that the process has evenly distributed the inserts by using either of the following:
dbcc checktable - to periodically to check the total page counts for each partition
sp_helpsegment or sp_helpartition - to perform a similar check, but without locking the database objects
For more information about dbcc checktable, see the System Administration Guide. For more information about sp_helpsegment and sp_helpartition, see the Reference Manual.
For more information about table partitions, see the Performance and Tuning Guide.
Reducing logging by increasing page allocationsIf you are using fast bcp, consider that each bcp in batch requires the page manager to allocate one or more extents. Each such allocation generates a single log record.
Use the number of preallocated extents configuration parameter to specify how many extents Adaptive Server is to allocate through the page manager.
Valid values for the number of preallocated extents configuration parameter are from 0 to 31; the default value is 2.
You must restart Adaptive Server to change the value.
When performing large bcp operations, increase this number to prevent the page allocations from filling the log.
Set this value to 0 to prevent large extent allocations, so that the page manager performs only single-page allocations.
Adaptive Server may allocate more pages than are actually needed, so keep the value small when space is limited. These pages are deallocated at the end of the batch.
For more information, see the System Administration Guide.
Use parallel bulk copy to copy data in parallel to a specific partition. Parallel bulk copy substantially increases performance during bcp sessions because it can split large bulk copy jobs into multiple sessions and run the sessions concurrently.
To use parallel bulk copy:
The destination table must be partitioned.
Use sp_helpartition to see the number of partitions on the table.
Use alter table ... partition to partition the table, if the table is not already partitioned.
The destination table should not contain indexes because:
If the table has a clustered index, this index determines the physical placement of the data, causing the partition specification in the bcp command to be ignored.
If any indexes exist, bcp automatically uses its slow bulk copy instead of its fast bulk copy mode.
If nonclustered indexes exist on the tables, parallel bulk copy is likely to lead to deadlocks on index pages.
Each partition should reside on a separate physical disk for the best performance.
Before you copy data into your database, you must partition the table destined to contain the data.
Parallel bulk copy can copy in to a table from multiple operating system files. To do so, use:
bcp tablename :partition_number in file_name
Figure 3-1 illustrates the parallel bulk copy process.
Figure 3-1: Copying data into a partitioned table using parallel bulk copy
See the Adaptive Server Enterprise Performance and Tuning Guide for information about partitioning a table.
When using parallel bulk copy to copy data out, you cannot specify which partitions bcp should use.
bcp in and locksWhen you copy in to a table using bcp, and particularly when you copy in to a table using parallel bcp, the copy process acquires the following locks:
An exclusive intent lock on the table
An exclusive page lock on each data page or data row
An exclusive lock on index pages, if any indexes exist
If you are copying in very large tables, and especially if you are using simultaneous copies into a partitioned table, this can involve a very large number of locks.
To avoid running out of locks:
Increase the number of locks.
To estimate the number of locks needed, use:
# of simultaneous batches * (rows_per_batch / (2016/row_length))
To see the row length for a table, use:
1> select maxlen
2> from sysindexes
3> where id = object_id("tablename") and (indid = 0 or indid = 1)See the System Administration Guide for more information about setting the number of locks.
Use the -b batchsize flag to copy smaller batches; the default batch size is 1000 rows.
Run fewer batches concurrently.
Use one of the following methods to copy in data using parallel bulk copy:
Start multiple bcp sessions in the background, being sure to:
Specify the password at the command line.
Use native mode, character mode, or a format file.
You can start bcp as many times as the table is partitioned.
Create and use a format file:
Start bcp in interactive mode.
Answer the prompts.
Create a format file that stores your responses.
Put the process in the background when the copy begins.
Issue the next bcp command, and specify the format file created with the first bcp command.
Start bcp sessions in multiple windows.
The syntax for parallel bulk copy is:
bcp table_name[:partition_number] in file_name -Pmypassword
where:
table_name is the name of the table into which you are copying the data
partition_number is the number of the partition into which you are copying
file_name is the host file that contains the data
mypassword is your password
To copy sorted data in parallel into a specific partition:
Specify the partition by appending a colon (:) plus the partition number to the table name. For example:
publishers:10
The partition you specify must exist before you issue the bcp command.
Split the sorted data into separate files, or delineate the "files" by specifying the first row (-F first_row) and the last row (-L last_row) of the host file.
Note the number of partitions in the table. This number limits the number of parallel bulk copy sessions that you can start.
For example, if a table has four partitions, and you start five parallel bulk copy jobs, only the first four jobs can run in parallel; the fifth job does not start until one of the first four jobs finish.
bcp copies each file or set of line numbers to a separate partition. For example, to use parallel bulk copy to copy in sorted data to mydb..bigtable from four files into four partitions, enter:
bcp mydb..bigtable:1 in file1 -Pmypassword -c & bcp mydb..bigtable:2 in file2 -Pmypassword -c & bcp mydb..bigtable:3 in file3 -Pmypassword -c & bcp mydb..bigtable:4 in file4 -Pmypassword -c &Parallel bulk copy and IDENTITY columns
When you are using parallel bulk copy, IDENTITY columns can cause a bottleneck. As bcp reads in the data, the utility both generates the values of the IDENTITY column and updates the IDENTITY column's maximum value for each row. This extra work may adversely affect the performance improvement that you expected to receive from using parallel bulk copy.
To avoid this bottleneck, you can explicitly specify the IDENTITY starting point for each session.
If you copy sorted data into the table without explicitly specifying the IDENTITY starting point, bcp might not generate the IDENTITY column values in sorted order. Parallel bulk copy reads the information into all the partitions simultaneously and updates the values of the IDENTITY column as it reads in the data.
A bcp statement with no explicit starting point would produce IDENTITY column numbers similar to those shown in Figure 3-2:
Figure 3-2: Producing IDENTITY columns in sorted order
The table has a maximum IDENTITY column number of 119, but the order is no longer meaningful.
If you want Adaptive Server to enforce unique IDENTITY column values, you must run bcp with either the -g or -E parameter.
Use the -g id_start_value flag to specify an IDENTITY starting point for a session in the command line.
The -g parameter instructs Adaptive Server to generate a sequence of IDENTITY column values for the bcp session without checking and updating the maximum value of the table's IDENTITY column for each row. Instead of checking, Adaptive Server updates the maximum value at the end of each batch.
Warning!
Be cautious about creating duplicate identity values inadvertently when you specify identity value ranges that overlap.
To specify a starting IDENTITY value, enter:
bcp [-gid_start_value]
For example, to copy in four files, each of which has 100 rows, enter:
bcp mydb..bigtable in file1 -g100 bcp mydb..bigtable in file2 -g200 bcp mydb..bigtable in file3 -g300 bcp mydb..bigtable in file4 -g400
Using the -g parameter does not guarantee that the IDENTITY column values are unique. To ensure uniqueness, you must:
Know how many rows are in the input files and what the highest existing value is. Use this information to set the starting values with the -g parameter and generate ranges that do not overlap.
In the example above, if any file contains more than 100 rows, the identity values overlap into the next 100 rows of data, creating duplicate identity values.
Make sure that no one else is inserting data that can produce conflicting IDENTITY values.
Use the -E parameter to set the IDENTITY starting point explicitly from the data file.
The -E parameter instructs bcp to prompt you to enter an explicit IDENTITY column value for each row. If the number of inserted rows exceeds the maximum possible IDENTITY column value, Adaptive Server returns an error.
|
|