![]() | ![]() |
Home |
|
|
ASA User's Guide |
|
| Part 1 Working with Databases |
|
| Chapter 10: Importing and Exporting Data |
Chapter 10
Transferring large amounts of data into and out of your database may be necessary in several situations. For example:
Importing an initial set of data into a new database
Exporting data into other applications, such as spreadsheets
Building new copies of a database with a modified structure
Creating extractions of a database
This chapter describes how to import data to and export data from databases, both in text form and in other formats.
Adaptive Server Anywhere supports the import and export of individual tables or a complete database using text files. The interactive SQL utility supports the import and export of data to other formats, such as spreadsheet program formats.
To load and unload tables using text files , use the SQL statements LOAD TABLE and the UNLOAD TABLE. The INPUT statement and the OUTPUT statement are Interactive SQL commands for importing and exporting data using a variety of formats.
You can unload individual tables or a complete database from Sybase Central using the Unload Database wizard. You can also unload individual tables or a complete database using the dbunload command line utility.
This chapter describes the SQL statements LOAD TABLE and UNLOAD TABLE, as well as how to use Sybase Central, dbunload, and Interactive SQL to import and export data.
The LOAD TABLE and UNLOAD TABLE statements import and export text files, with one row per line, and values separated by a delimiter.
The Interactive SQL INPUT and OUTPUT statements support the following file formats:
File Format | Description |
ASCII | A text file, one row per line, with values separated by a delimiter. String values optionally appear enclosed in apostrophes (single quotes). This is the same as the format used by LOAD TABLE and UNLOAD TABLE |
DBASEII | DBASE II format |
DBASEIII | DBASE III format |
DIF | Data Interchange Format |
FIXED | Data records appear in fixed format with the width of each column either the same as defined by the column's type or specified as a parameter |
FOXPRO | FoxPro format |
LOTUS | Lotus workspace format |
WATFILE | WATFILE format. |
An Interactive SQL option exists for both the default input format and one for the default output format. For details on how to specify these options, see INPUT_FORMAT option and OUTPUT_FORMAT option .
Alternately, you can specify the file format on each INPUT statement or OUTPUT statement.
For more information about input and output data formats, see INPUT statement or OUTPUT statement .
Unloading and reloading a database can be a time consuming operation, and can require a large amount of disk space. While the database is being unloaded and reloaded, it is unavailable for use. For these reasons, unloading and reloading a database is not advised in a production environment unless you have a definite goal in mind.
Some reasons for unloading and reloading a database are as follows:
Upgrading your database
Most new features are made available by applying the Upgrade utility. The New Features documentation will state if an unload and reload is required to obtain a particular feature. In general, only features that require a change in database file format require an unload and reload.
Reclaim disk space
Database files do not shrink if data is deleted. Instead, any empty pages are marked as free so that they can be used again, but they are not removed from the database file. Unloading and reloading a database can reclaim disk space if you have deleted a large amount of data from your database and do not anticipate adding more.
Improve performance
Unloading and reloading databases can improve performance for the following reasons:
If data on pages within the database is fragmented, unloading and reloading can eliminate the fragmentation.
Since the data is unloaded and reloaded in order by primary keys, access to related information can be faster, as related rows may appear on the same or adjacent pages.
The procedure for unloading and reloading a database depends on whether the database is involved in replication or not. If the database is involved in replication, you must preserve the transaction log offsets across the operation, as the Message Agent and Replication Agent require this information. If the database is not involved in replication, the process is simpler.
The following procedure should be used only if your database is not involved in replication.
To unload and reload a database not involved in replication:Shut down the database.
Perform a full off-line backup by copying the database and transaction log files to a secure location.
Rebuild the database.
For information on this step, see Rebuilding a database .
You can now allow user access to the database.
If a database is participating in replication, particular care needs to be taken if you wish to unload and reload the databases.
Replication is based on the transaction log. When a database is unloaded and reloaded, the old transaction log is no longer available. For this reason, good backup practices are especially important when participating in replication.
There are two ways of unloading and reloading a database involved in replication. The first method uses the dbunload utility
-aroption to make the unload and reload occur in a way that does not interfere with replication. The second method is a manual method of accomplishing the same task.To unload and reload a database involved in replication:
Shut down the database.
Perform a full off-line backup by copying the database and transaction log files to a secure location.
Rebuild the database using the following command line.
dbunload -c connection_string -ar directory
where connection_string is a connection with DBA authority, directory is the directory used in your replication environment for old transaction logs, and there are no other connections to the database.
For more information, see Unload utility options .
Shut down the new database. Perform validity checks that you would usually perform after restoring a database.
Start the database using any production switches you need. You can now allow user access to the reloaded database.
If the above procedure does not meet your needs, you can manually adjust the transaction log offsets. The following procedure describes how to carry out that operation.
To unload and reload a database involved in replication, with manual intervention:Shut down the database.
Perform a full off-line backup by copying the database and transaction log files to a secure location.
Run the dbtran utility to display the starting offset and ending offset of the database's current transaction log file. Note the ending offset for later use.
Rename the current transaction log file so that it is not modified during the unload process, and place this file in the off-line directory.
Rebuild the database.
For information on this step, see Rebuilding a database .
Shut down the new database.
Erase the current transaction log file for the new database.
Use dblog on the new database with the ending offset noted in step 3 as the -z parameter, and also set the relative offset to zero.
dblog -x 0 -z 137829 database-name.db
When you run the Message Agent, provide it with the location of the original off-line directory on its command line.
Start the database. You can now allow user access to the reloaded database.
The central step of the unload and reload procedure is also called rebuilding the database. You can carry this operation out from Sybase Central or using command line utilities.
If your database is involved in replication, there are special considerations when rebuilding databases. For more information, see
To rebuild a database (Sybase Central):Start Sybase Central. In the left pane, open the Adaptive Server Anywhere plugin.
Open the Utilities folder. Double click Rebuild Database.
Follow the instructions in the Wizard.
If you are using the command-line utilities, you have the choice of a straightforward and secure method that does not create a temporary copy of the data on disk, or a more complex method that has better performance.
To rebuild a database when security is a concern (Command line):Execute the dbunload command-line utility using the -an command-line switch to create a new database.
dbunload -c "connection_string" -an newdemo.db"
The connection_string connects to the database to be unloaded, with DBA authority.
To rebuild a database when performance is a concern (Command line):Start the database.
Unload the database using the dbunload utility.
For information on dbunload and its options, see The DBUNLOAD command-line utility .
Shut down the existing database. This database and any log file created in this and the previous step is no longer needed.
Initialize a new database using the dbinit utility.
For information on dbinit options, see The DBINIT command-line utility . For information on initializing databases, see Initializing a database .
Reload the data into the new database.
Using Interactive SQL, execute the reload.sql script held in your unload directory.
dbisql -c "uid=dba;pwd=sql" reload.sql
You can export data from your database using the:
UNLOAD TABLE statement, for efficient export of text files
Interactive SQL OUTPUT statement, for slower but more flexible export to a variety of file formats
DBUNLOAD utility, for text export of more than one table
This section describes each of these methods, and includes some tips for dealing with NULL output.
The UNLOAD TABLE statement efficiently exports data from a database table to a text file. You must have SELECT permission on the table to use the UNLOAD TABLE statement.
The following statement unloads the department table from the sample database into the file dept.txt in the server's current working directory. If you are running against a network server, the command unloads the data into a file on the server machine, not the client machine.
UNLOAD TABLE department TO 'dept.txt'
The dept.txt file has the following contents:
100,'R & D',501 200,'Sales',902 300,'Finance',1293 400,'Marketing',1576 500,'Shipping',703
Each row of the table is output on a single line of the output file.
No column names are exported.
The columns are separated, or delimited, by a comma. The delimiter character can be changed using the DELIMITED BY clause.
The fields are not fixed-width fields. Only the characters in each entry are exported, not the full width of the column.
The character data in the dept_name column is enclosed in single quotes. The single quotes can be turned off using the QUOTES clause.
The data exports in order by primary key values. This makes reloading quicker. You can export data in the order it's stored in using the ORDER OFF clause.
The file name is relative to the server's current directory, not the current directory of the client application. Also, the file name passes to the server as a string. Using escape backslash characters in the file name prevents misinterpretation if a directory of file name begins with an n (\n is a newline character) or any other special characters. For example, the following statement unloads a table into the file c:\temp\newfile.dat:
UNLOAD TABLE employee TO 'c:\\temp\\newfile.dat'
For more information on the syntax, see UNLOAD TABLE statement .
The following example uses explicit settings for the DELIMITED BY and QUOTES clauses:
UNLOAD TABLE department TO 'dept.txt' DELIMITED BY '$' QUOTES OFF
The resulting dept.txt file has the following contents:
100$R & D$501 200$Sales$902 300$Finance$1293 400$Marketing$1576 500$Shipping$703
If a delimiter character appears within a value and the QUOTES option is turned off, the character is replaced by its hexadecimal value preceded by
\x. For example:
UNLOAD TABLE department TO 'dept.txt' DELIMITED BY '&' QUOTES OFF
yields the following output file:
100&R \x26 D&501 200&Sales&902 300&Finance&1293 400&Marketing&1576 500&Shipping&703
You can export queries to a file from Interactive SQL using either the OUTPUT statement or by redirecting output.
Using the output statementYou can export data from a database to a variety of file formats using the Interactive SQL OUTPUT statement. This statement exports the results of the current query (the one displayed in the Interactive SQL Data window) and puts the results into a specified file. You can specify the output format on the output command.
For example, the following commands extract the employee table to a dBaseIII format file:
SELECT * FROM employee; OUTPUT TO employee.dbf FORMAT dbaseiii;Using output redirection
You can use output redirection to export data as an alternative to the OUTPUT statement.
You can redirect the output of any command to a file or device by putting the
>#redirection symbol anywhere on the command. A file name must follow the redirection symbol. For example:
SELECT * FROM employee ># filename
Do not enclose the file name in quotation marks.
Output redirection is most useful on the SELECT statement. You can use the SET OUTPUT_FORMAT command to control the format of the output file.
The >& redirection symbol redirects all output including error messages and statistics for the command on which it appears. For example, the following outputs the SELECT statement to the file, followed by the output from the SELECT statement and some statistics pertaining to the command:
SELECT * FROM employee >& filename
Do not enclose the file name in quotation marks.
The
>&redirection is useful for generating a log of what happens during a READ command. The statistics and errors of each command appear after the command in the redirected output file.
If two
>characters appear in a redirection symbol instead of one (
>>#or
>>&.), the output is appended to the specified file instead of replacing the contents of the file. For output from the SELECT statement, headings are output if and only if the output starts at the beginning of the specified file and the output format supports headings.
Most commonly, users want to extract data for use in other software products.
Since the other software package may not understand NULL values, there is a Interactive SQL option (NULLS) that allows you to choose how NULL values are output. Alternatively, you can use the IFNULL function to output a specific value whenever there is a NULL value.
For information on setting Interactive SQL options, see SET OPTION statement .
The dbunload utility is used to unload an entire database in ASCII comma-delimited format and to create the necessary Interactive SQL command files to completely recreate your database. This may be useful for creating extractions, creating a backup of your database, or building new copies of your database with the same or a slightly modified structure.
If you want to rearrange your tables in the database, you can use dbunload to create the necessary command files and modify them as needed.
For Windows 3.x, the DBUNLOAD executable is named dbunloaw.exe.
For a full description of dbunload utility command-line switches, see the section The Unload utility .
The dbunload utility can also export a list of tables, rather than the entire database. This is useful for retrieving data from a corrupted database that cannot be entirely unloaded.
The following statement unloads the data from the sample database (assumed to be running on the default database server with the default database name) into a set of files in the c:\temp directory. A command file to rebuild the database from the data files is created with the default name reload.sql in the current directory.
dbunload -c "dbn=asademo;uid=dba;pwd=sql" c:\temp
You can import data into your database using:
The LOAD TABLE statement, for efficient import of text files.
The Interactive SQL INPUT statement, for slower but more flexible import of a variety of file formats.
Interactive input using the INSERT statement or the Interactive SQL INPUT statement.
This section describes each of these methods, and also describes some tips for dealing with incompatible data structure and conversion errors.
The LOAD TABLE statement efficiently imports data from a text file into a database table. The table must exist and have the same number of columns as the input file has fields, defined on compatible data types. To use the LOAD TABLE statement, the user must have ALTER permission on the table.
If the department table had all its rows deleted, the following statement would load the data from the file dept.txt into the department table:
LOAD TABLE department FROM 'dept.txt'
The LOAD TABLE statement appends the contents of the file to the existing rows of the table; it does not replace the existing rows in the table. You can use the TRUNCATE TABLE statement to remove all the rows from a table.
Neither the TRUNCATE TABLE statement nor the LOAD TABLE statement fires triggers, including referential integrity actions such as cascaded deletes.
The LOAD TABLE statement has many of the same options as the UNLOAD TABLE statement.
For a description of column delimiters, use of quotes, and file names, see Unloading data using the UNLOAD TABLE statement .
The LOAD TABLE statement has the additional STRIP clause. The default setting (STRIP ON) strips trailing blanks from values before they are inserted. To keep trailing blanks, use the STRIP OFF clause in your LOAD TABLE statement.
For a full description of the LOAD TABLE statement syntax, see LOAD TABLE statement .
For a full description of the UNLOAD TABLE statement syntax, see UNLOAD TABLE statement
You can load data with the same structure as existing database tables into your database from a file using the Interactive SQL INPUT statement. Although less efficient than the LOAD TABLE statement (which works only for text files), the Interactive SQL INPUT statement does support several different file formats.
You can enter the INPUT command in Interactive SQL as follows:
INPUT INTO t1 FROM file1 FORMAT ASCII; INPUT INTO t2 FROM file2 FORMAT FIXED COLUMN WIDTHS (5, 10, 40, 40 ); ...
These statements could be put in a command file which Interactive SQL can execute.
For more information about command files, see the tutorial chapter Command Files .
You can use two commands to input data interactively. For example, you can use the insert command to insert a single row at a time:
INSERT INTO T1 VALUES ( ... )
Or you can use the input command which gives you a full screen to type in data in the current input format (controlled by the Interactive SQL INPUT_FORMAT option). For example:
INPUT INTO T1 PROMPT
When you load data from external sources, there may be errors in the data. For example, there may be dates that are not valid dates and numbers that are not valid numbers. The CONVERSION_ERROR database option allows you to ignore conversion errors by converting them to NULL values.
For information on setting Interactive SQL database options, see SET OPTION statement .
The structure of the data you want to load into a table does not always match the structure of the destination table itself. For example, the column data types may be different or in a different order, or there may be extra values in the import data that do not match columns in the destination table.
To load data with a different structure:Using the LOAD TABLE statement, load the data into a temporary table with a structure matching that of the input file.
Use the INSERT statement with a FROM SELECT clause to extract and summarize data from the temporary table and put it into one or more of the permanent database tables.
If you want to load a set of data once and for all, use the DECLARE TEMPORARY TABLE statement to make the temporary table. A declared temporary table exists only for the duration of a connection or, if defined inside a compound statement, for the duration of the compound statement.
If you load data of a similar structure repeatedly, use the CREATE TABLE statement to make the temporary table, and be sure to specify a global temporary table. The definition of the temporary table is held in the database permanently, but the rows exist only within a given connection.
Although loading large volumes of data into a database can be very time consuming, there are a few things you can do to save time:
If you use the LOAD TABLE statement, then bulk mode is not necessary.
If you are using the INPUT command, run Interactive SQL or the client application on the same machine as the server. Loading data over the network adds extra communication overhead. This might mean loading new data during off hours.
Place data files on a separate physical disk drive from the database file. This could avoid excessive disk head movement during the load.
Increase the size of the database cache. Eliminate disk cache in favor of database cache if the machine is a dedicated database server.
For a description of how to control the cache size from the server command line option see The database server .
If you are using the INPUT command, start the server with the
-bswitch for bulk operations mode. In this mode, the server does not keep a rollback log or a transaction log, it does not perform an automatic COMMIT before data definition commands, and it does not lock any records.
Without a rollback log, you cannot use savepoints and aborting a command always causes transactions to roll back. Without automatic COMMIT, a ROLLBACK undoes everything since the last explicit COMMIT.
Without a transaction log, there is no log of the changes. You should back up the database file before and after using bulk operations mode because, in this mode, your database is not protected against media failure. For more information, see Backup and Data Recovery .
The server allows only one connection when you use the
-bswitch.
If you have data that requires many commits, running with the
-boption may slow database operation. At each COMMIT, the server carries out a checkpoint; this frequent checkpointing can slow the server.
Extend the size of the database file, as described in ALTER DBSPACE statement . This command allows a database file to be extended in large amounts before the space is required, rather than the normal 32 pages at a time when the space is needed. As well as improving performance for loading large amounts of data, it also serves to keep the database files more contiguous within the file system.
|
|