![]() | ![]() |
Home |
|
|
Utility Guide |
|
| Chapter 3 Using bcp to Transfer Data to and from Adaptive Server |
|
| bcp performance |
|
| Using parallel bulk copy to copy data into a specific partition |
Use parallel bulk copy to copy data in parallel to a specific partition. Parallel bulk copy substantially increases performance during bcp sessions because it can split large bulk copy jobs into multiple sessions and run the sessions concurrently.
To use parallel bulk copy:
The destination table must be partitioned.
Use sp_helpartition to see the number of partitions on the table.
Use alter table ... partition to partition the table, if the table is not already partitioned.
The destination table should not contain indexes because:
If the table has a clustered index, this index determines the physical placement of the data, causing the partition specification in the bcp command to be ignored.
If any indexes exist, bcp automatically uses its slow bulk copy instead of its fast bulk copy mode.
If nonclustered indexes exist on the tables, parallel bulk copy is likely to lead to deadlocks on index pages.
Each partition should reside on a separate physical disk for the best performance.
Before you copy data into your database, you must partition the table destined to contain the data.
Parallel bulk copy can copy in to a table from multiple operating system files. To do so, use:
bcp tablename :partition_number in file_name
Figure 3-1 illustrates the parallel bulk copy process.
Figure 3-1: Copying data into a partitioned table using parallel bulk copy
See the Adaptive Server Enterprise Performance and Tuning Guide for information about partitioning a table.
When using parallel bulk copy to copy data out, you cannot specify which partitions bcp should use.
bcp in and locksWhen you copy in to a table using bcp, and particularly when you copy in to a table using parallel bcp, the copy process acquires the following locks:
An exclusive intent lock on the table
An exclusive page lock on each data page or data row
An exclusive lock on index pages, if any indexes exist
If you are copying in very large tables, and especially if you are using simultaneous copies into a partitioned table, this can involve a very large number of locks.
To avoid running out of locks:
Increase the number of locks.
To estimate the number of locks needed, use:
# of simultaneous batches * (rows_per_batch / (2016/row_length))
To see the row length for a table, use:
1> select maxlen
2> from sysindexes
3> where id = object_id("tablename") and (indid = 0 or indid = 1)See the System Administration Guide for more information about setting the number of locks.
Use the -b batchsize flag to copy smaller batches; the default batch size is 1000 rows.
Run fewer batches concurrently.
Use one of the following methods to copy in data using parallel bulk copy:
Start multiple bcp sessions in the background, being sure to:
Specify the password at the command line.
Use native mode, character mode, or a format file.
You can start bcp as many times as the table is partitioned.
Create and use a format file:
Start bcp in interactive mode.
Answer the prompts.
Create a format file that stores your responses.
Put the process in the background when the copy begins.
Issue the next bcp command, and specify the format file created with the first bcp command.
Start bcp sessions in multiple windows.
The syntax for parallel bulk copy is:
bcp table_name[:partition_number] in file_name -Pmypassword
where:
table_name is the name of the table into which you are copying the data
partition_number is the number of the partition into which you are copying
file_name is the host file that contains the data
mypassword is your password
To copy sorted data in parallel into a specific partition:
Specify the partition by appending a colon (:) plus the partition number to the table name. For example:
publishers:10
The partition you specify must exist before you issue the bcp command.
Split the sorted data into separate files, or delineate the "files" by specifying the first row (-F first_row) and the last row (-L last_row) of the host file.
Note the number of partitions in the table. This number limits the number of parallel bulk copy sessions that you can start.
For example, if a table has four partitions, and you start five parallel bulk copy jobs, only the first four jobs can run in parallel; the fifth job does not start until one of the first four jobs finish.
bcp copies each file or set of line numbers to a separate partition. For example, to use parallel bulk copy to copy in sorted data to mydb..bigtable from four files into four partitions, enter:
bcp mydb..bigtable:1 in file1 -Pmypassword -c & bcp mydb..bigtable:2 in file2 -Pmypassword -c & bcp mydb..bigtable:3 in file3 -Pmypassword -c & bcp mydb..bigtable:4 in file4 -Pmypassword -c &Parallel bulk copy and IDENTITY columns
When you are using parallel bulk copy, IDENTITY columns can cause a bottleneck. As bcp reads in the data, the utility both generates the values of the IDENTITY column and updates the IDENTITY column's maximum value for each row. This extra work may adversely affect the performance improvement that you expected to receive from using parallel bulk copy.
To avoid this bottleneck, you can explicitly specify the IDENTITY starting point for each session.
If you copy sorted data into the table without explicitly specifying the IDENTITY starting point, bcp might not generate the IDENTITY column values in sorted order. Parallel bulk copy reads the information into all the partitions simultaneously and updates the values of the IDENTITY column as it reads in the data.
A bcp statement with no explicit starting point would produce IDENTITY column numbers similar to those shown in Figure 3-2:
Figure 3-2: Producing IDENTITY columns in sorted order
The table has a maximum IDENTITY column number of 119, but the order is no longer meaningful.
If you want Adaptive Server to enforce unique IDENTITY column values, you must run bcp with either the -g or -E parameter.
Use the -g id_start_value flag to specify an IDENTITY starting point for a session in the command line.
The -g parameter instructs Adaptive Server to generate a sequence of IDENTITY column values for the bcp session without checking and updating the maximum value of the table's IDENTITY column for each row. Instead of checking, Adaptive Server updates the maximum value at the end of each batch.
Warning!
Be cautious about creating duplicate identity values inadvertently when you specify identity value ranges that overlap.
To specify a starting IDENTITY value, enter:
bcp [-gid_start_value]
For example, to copy in four files, each of which has 100 rows, enter:
bcp mydb..bigtable in file1 -g100 bcp mydb..bigtable in file2 -g200 bcp mydb..bigtable in file3 -g300 bcp mydb..bigtable in file4 -g400
Using the -g parameter does not guarantee that the IDENTITY column values are unique. To ensure uniqueness, you must:
Know how many rows are in the input files and what the highest existing value is. Use this information to set the starting values with the -g parameter and generate ranges that do not overlap.
In the example above, if any file contains more than 100 rows, the identity values overlap into the next 100 rows of data, creating duplicate identity values.
Make sure that no one else is inserting data that can produce conflicting IDENTITY values.
Use the -E parameter to set the IDENTITY starting point explicitly from the data file.
The -E parameter instructs bcp to prompt you to enter an explicit IDENTITY column value for each row. If the number of inserted rows exceeds the maximum possible IDENTITY column value, Adaptive Server returns an error.
|
|