Oracle8(TM) Server Utilities Release 8.0 A54652-01 |
|
This chapter describes SQL*Loader's conventional and direct path load methods. The following topics are covered:
For an example of loading with using the direct path load method, see "Case 6: Loading Using the Direct Path Load Method" on page 4-20. The other cases use the conventional path load method.
Note: If you are using Trusted Oracle, see the Trusted Oracle documentation for information about using SQL*Loader in that environment.
SQL*Loader provides two methods for loading data:
Conventional path loads execute SQL INSERT statement(s) to populate table(s) in an Oracle database. Direct path loads eliminate much of the Oracle database overhead by formatting Oracle data blocks and writing the data blocks directly to the database files. The direct load, therefore, does not compete with other users for database resources so it can usually load data at nearly disk speed. Certain considerations, inherent to this method of access to database files, such as restrictions, security and backup implications, are discussed in this chapter.
Conventional path loads (the default) use the SQL INSERT statement and a bind array buffer to load data into database tables. This method is used by all Oracle tools and applications.
When SQL*Loader performs a conventional path load, it competes equally with all other processes for buffer resources. This can slow the load significantly. Extra overhead is added as SQL commands are generated, passed to Oracle, and executed.
Oracle looks for partially filled blocks and attempts to fill them on each insert. Although appropriate during normal use, this can slow bulk loads dramatically.
By definition, conventional path uses SQL INSERT statements. For conventional path loading of a single partition, SQL*Loader makes use of the partition-extended syntax of the INSERT statement which has the following form:
INSERT into table T partition (P) VALUES ...
The SQL layer of the ORACLE kernel determines if the row being inserted maps to the specified partition. If the row does not map to the partition, the row is rejected, and the loader log file records an appropriate error message.
Because the direct path is faster than the conventional path, it is desirable to use the direct path. But there are times when the conventional path is preferred, or when the direct path cannot be used due to restrictions. You should use the conventional path in the following situations:
To use the direct path (excepting parallel loads), SQL*Loader must have exclusive write access to the table and exclusive read-write access to any indexes.
You cannot load data through the direct path with SQL*Net; unless both systems belong to the same family of computers, and both are using the same character set. Even then, load performance can be significantly impaired by network overhead.
Direct path does not support loading of clustered tables.
On the direct path, the existing index is copied when it is merged with the new index keys. If the existing index is very large and the number of new keys is very small, then the index copy time can offset the time saved by loading the data with the direct path.
Because these constraints cannot be applied to rows loaded on the direct path, they are disabled for the duration of the load. Then they are applied to the whole table when the load completes. The costs could outweigh the savings for a very large table and a small number of new rows.
SQL functions are not available on the direct path. For more information on the SQL functions, see "Applying SQL Operators to Fields" on page 5-82.
Instead of filling a bind array buffer and passing it to Oracle with a SQL INSERT command, a direct path load parses the input data according to the description given in the loader control file, converts the data for each input field to its corresponding Oracle column datatype, and builds a column array structure (an array of <length, data> pairs.) The loader then uses the column array structure to format Oracle data blocks and build index keys. The newly formatted database blocks are then written directly to the database (multiple blocks per I/O request using asynchronous writes if the host platform supports asynch I/O.)
Internally, multiple buffers are used for the formatted blocks. While one buffer is being filled, one or more buffers are being written if asynchronous I/O is available on the host platform. Overlapping computation with I/O increases load performance.
Figure 8-1 on page 8 - 5 shows how conventional and direct path loads perform database writes.
When loading a partitioned table, the direct path load engine in the Oracle kernel partitions the rows, and maintains indexes (which can also be partitioned.) Note that direct path loading of a partitioned table can be quite resource intensive for tables with many partitions.
When loading a single partition of a partitioned table, the direct path load engine in the Oracle kernel partitions the rows and rejects any rows which do not map to the partition specified in the loader control file. Local index partitions which correspond to the data partition being loaded are maintained by the loader. Global indexes are not maintained on single partition direct path loads.
While loading a partition of a partitioned table, DML operations on, and direct path loads of other partitions in the table are allowed.
Although direct path loads minimize the database processing, several calls to the Oracle server are made at the beginning and end of the load to initialize and finish the load, respectively. The necessary DML locks are required during load initialization, and released when the load finishes. Also, during the load the following operations occur: index keys are built and put into a sort, space management routines are used to get new extents when needed and to adjust the high-water mark for a data save point. The high-water mark is described in "Data Saves" on page 8-12.
The direct path method is faster than the conventional path for the following reasons:
See "Instance Recovery and Direct Path Loads" on page 8-13.
You should use a direct path load in the following situations:
In addition to the general load conditions described in "Conventional Path Load versus Direct Path Load" on page 3-16, the following conditions must be satisfied to use the direct path load method:
To check for this condition, use the SQL*DBA command MONITOR TABLE to find the object ID for the table(s) you want to load. Then use the command MONITOR LOCK to see if there are any locks on the table.
In addition to the above listed restrictions, loading of a single partition has the following restrictions:
All integrity constraints are enforced during direct path loads, although not necessarily at the same time. NOT NULL constraints are enforced during the load. Records that fail these constraints are rejected.
UNIQUE constraints are enforced both during and after the load. A record which violates a UNIQUE constraint is not rejected (the record is not available in memory when the constraint violation is detected.)
Integrity constraints that depend on other rows or tables, such as referential constraints, are disabled before the direct path load and must be re-enabled afterwards. If REENABLE is specified, SQL*Loader can re-enable them automatically at the end of the load. When the constraints are re-enabled, the entire table is checked. Any rows that fail this check are reported in the specified error log. See the section in this chapter called "Direct Loads, Integrity Constraints, and Triggers".
DEFAULT column specifications defined in the database are not available when loading on the direct path. Fields for which default values are desired must be specified with the DEFAULTIF clause, described on "DEFAULTIF Clause" on page 5-74. If a DEFAULTIF clause is not specified, and the field is NULL, then a NULL value is inserted into the database.
You can load data into a synonym for a table during a the direct path load, but the synonym must point directly to a table. It cannot be a synonym for a view or a synonym for another synonym.
A SQL*Loader direct load can only be done for a database of the same version. For example, you cannot do a SQL*Loader Version 7.1.2 direct path load to load into a Oracle Version 7.1.3 database.
This section explains you how to use SQL*Loader's direct path load.
To prepare the database for direct path loads, you must run the setup script, CATLDR.SQL to create the necessary views. You need only run this script once for each database you plan to do direct loads to. This script can be run during database installation if you know then that you will be doing direct loads.
To start SQL*Loader in direct load mode, the parameter DIRECT must be set to TRUE on the command line or in the parameter file, if used, in the format:
DIRECT=TRUE
See "Case 6: Loading Using the Direct Path Load Method" on page 4-20 for an example.
During a direct path load, performance is improved by using temporary storage. After each block is formatted, the new index keys are put to a sort (temporary) segment. The old index and the new keys are merged at load finish time to create the new index. The old index, sort (temporary) segment, and new index segment all require storage until the merge is complete. Then the old index and temporary segment are removed.
Note that, during a conventional path load, every time a row is inserted the index is updated. This method does not require temporary storage space, but it does add processing time.
Performance on systems with limited memory can also be improved by using the SINGLEROW option. For more information see "SINGLEROW Option" on page 5-40.
Note: If, during a direct load, you have specified that the data is to be pre-sorted and the existing index is empty, a temporary segment is not required, and no merge occurs-the keys are put directly into the index. See "Maximizing Performance of Direct Path Loads" on page 8-15 for more information.
When multiple indexes are built, the temporary segments corresponding to each index exist simultaneously, in addition to the old indexes. The new keys are then merged with the old indexes, one index at a time. As each new index is created, the old index and the corresponding temporary segment are removed.
The formula for calculating the amount of space needed for storing the index itself can be found in the chapter "Managing Database Files" of Oracle8 Server Administrator's Guide. Remember that two indexes exist until the load is complete: the old index and the new index.
The amount of temporary segment space needed for storing the new index keys (in bytes) can be estimated using the following formula:
1.3 * key_storage
where:
key_storage = (number_of_rows) *
( 10 + sum_of_column_sizes + number_of_columns )
The columns included in this formula are the columns in the index. There is one length byte per column, and 10 bytes per row are used for a ROWID and additional overhead.
The constant 1.3 reflects the average amount of extra space needed for sorting. This value is appropriate for most randomly ordered data. If the data arrives in exactly opposite order, twice the key-storage space is required for sorting, and the value of this constant would be 2.0. That is the worst case.
If the data is fully sorted, only enough space to store the index entries is required, and the value of this constant reduces to 1.0. See "Pre-sorting Data for Faster Indexing" on page 8-16 for more information.
SQL*Loader will leave indexes in Index Unusable state when the data segment being loaded becomes more up-to-date than the index segments that index it.
Any SQL statement that tries to use an index that is in Index Unusable state returns an error. The following conditions cause the direct path option to leave an index or a partition of a partitioned index in Index Unusable state:
To determine if an index is in Index Unusable state, you can execute a simple query:
SELECT INDEX_NAME, STATUS
FROM USER_INDEXES
WHERE TABLE_NAME = 'tablename';
To determine if an index partition is in unusable state,
SELECT INDEX_NAME,
PARTITION_NAME,
STATUS FROM USER_IND_PARTITIONS
WHERE STATUS != 'VALID';
If you are not the owner of the table, then search ALL_INDEXES or DBA_INDEXES instead of USER_INDEXES. For partitioned indexes, search ALL_IND_PARTITIONS and DBA_IND_PARTITIONS instead of USER_IND_PARTITIONS.
You can use data saves to protect against loss of data due to instance failure. All data loaded up to the last data save is protected against instance failure To continue the load after an instance failure, determine how many rows from the input file were processed before the failure, then use the SKIP option to skip those processed rows. If there were any indexes on the table, drop them before continuing the load, then recreate them after the load. See "Recovery" on page 8-13 for more information on media and instance failure.
Note: Indexes are not protected by a data save, because SQL*Loader does not build indexes until after data loading completes. (The only time indexes are built during the load is when pre-sorted data is loaded into an empty table - but these indexes are also unprotected.)
The parameter ROWS determines when data saves occur during a direct path load. The value you specify for ROWS is the number of rows you want SQL*Loader to read from the input file before saving inserts in the database.
The number of rows you specify for a data save is an approximate number. Direct loads always act on full data buffers that match the format of Oracle database blocks. So, the actual number of data rows saved is rounded up to a multiple of the number of rows in a database block.
SQL*Loader always reads the number of rows needed to fill a database block. Discarded and rejected records are then removed, and the remaining records are inserted into the database. So the actual number of rows inserted before a save is the value you specify, rounded up to the number of rows in a database block, minus the number of discarded and rejected records.
A data save is an expensive operation. The value for ROWS should be set high enough so that a data save occurs once every 15 minutes or longer. The intent is to provide an upper bound on the amount of work which is lost when an instance failure occurs during a long running direct path load. Setting the value of ROWS to a small number will have an adverse affect on performance.
In a conventional load, ROWS is the number of rows to read before a commit. A direct load data save is similar to a conventional load commit, but it is not identical. The similarities are:
The major difference is that the indexes will be unusable (in Index Unusable state) until the load completes.
SQL *Loader provides full support for data recovery when using the direct path option. There are two main types of recovery:
See Oracle8 Server Administrator's Guide for more information about recovery.
Because SQL*Loader writes directly to the database files, all rows inserted up to the last data save will automatically be present in the database files if the instance is restarted. Changes do not need to be recorded in the redo log file to make instance recovery possible.
If an instance failure occurs, the indexes being built may be left in Index Unusable state. Indexes which are Unusable must be re-built before using the table or partition. See "Indexes Left in Index Unusable State" on page 8-11 for more information on how to determine if an index has been left in Index Unusable state.
If redo log file archiving is enabled (you are operating in ARCHIVELOG mode), SQL*Loader logs loaded data when using the direct path, making media recovery possible. If redo log archiving is not enabled (you are operating in NOARCHIVELOG mode), then media recovery is not possible.
To recover a database file that was lost while it was being loaded, use the same method that you use to recover data loaded with the conventional path:
Data that is longer than SQL*Loader's maximum buffer size can be loaded on the direct path with either the PIECED option or by specifying the number of READBUFFERS. This section describes those two options.
The data can be loaded in sections with the pieced option if it is the last column of the logical record. The syntax for this specification is given "High-Level Syntax Diagrams" on page 5-5.
Declaring a column as PIECED informs the direct path loader that the field may be processed in pieces, one buffer at once.
The following restrictions apply when declaring a column as PIECED:
For example, a PIECED filed could span 3 records. SQL*Loader loads the piece from the first record and then reuses the buffer for the second buffer. After loading the second piece, the buffer is reused for the third record. If an error is then discovered, only the third record is placed in the bad file because the first two records no longer exist in the buffer. As a result, the record in the bad file would not be valid.
For data that is not divided into separate sections, or not in the last column, READBUFFERS can be specified. With READBUFFERS a buffer transfer area can be allocated that is large enough to hold the entire logical record at one time.
READBUFFERS specifies the number of buffers to use during a direct path load. (A LONG can span multiple buffers.) The default value is four buffers. If the number of read buffers is too small, the following error results:
ORA-02374 ... No more slots for read buffer queue
Note: Do not specify a value for READBUFFERS unless it becomes necessary, as indicated by ORA-2374. Values of READBUFFERS that are larger than necessary do not enhance performance. Instead, higher values unnecessarily increase system overhead.
You can control the time and temporary storage used during direct path loads.
To minimize time:
To minimize space:
SQL*Loader automatically adds extents to the table if necessary, but this process takes time. For faster loads into a new table, allocate the required extents when the table is created.
To calculate the space required by a table, see the chapter "Managing Database Files" in Oracle8 Server Administrator's Guide. Then use the INITIAL or MINEXTENTS clause in the SQL command CREATE TABLE to allocate the required space.
Another approach is to size extents large enough so that extent allocation is infrequent.
You can improve the performance of direct path loads by pre-sorting your data on indexed columns. Pre-sorting minimizes temporary storage requirements during the load. Pre-sorting also allows you to take advantage of high-performance sorting routines that are optimized for your operating system or application.
If the data is pre-sorted and the existing index is not empty, then pre-sorting minimizes the amount of temporary segment space needed for the new keys. The sort routine appends each new key to the key list. Instead of requiring extra space for sorting, only space for the keys is needed. To calculate the amount of storage needed, use a sort factor of 1.0 instead of 1.3. For more information on estimating storage requirements, see "Temporary Segment Storage Requirements" on page 8-10.
If pre-sorting is specified and the existing index is empty, then maximum efficiency is achieved. The sort routines are completely bypassed, with the merge phase of index creation. The new keys are simply inserted into the index. Instead of having a temporary segment and new index existing simultaneously with the empty, old index, only the new index exists. So, temporary storage is not required, and time is saved.
The SORTED INDEXES statement identifies the indexes on which the data is presorted. This statement is allowed only for direct path loads. See Chapter 5, "SQL*Loader Control File Reference" for the syntax, and see"Case 6: Loading Using the Direct Path Load Method" on page 4-20 for an illustration.
Generally, you specify only one index in the SORTED INDEXES statement because data that is sorted for one index is not usually in the right order for another index. When the data is in the same order for multiple indexes, however, all of the indexes can be specified at once.
All indexes listed in the SORTED INDEXES statement must be created before you start the direct path load.
If you specify an index in the SORTED INDEXES statement, and the data is not sorted for that index, then the index is left in Index Unusable state at the end of the load. The data is present, but any attempt to use the index results in an error. Any index which is left in Index Unusable state must be re-built after the load.
If you specify a multiple-column index in the SORTED INDEXES statement, the data should be sorted so that it is ordered first on the first column in the index, next on the second column in the index, and so on.
For example, if the first column of the index is city, and the second column is last name; then the data should be ordered by name within each city, as in the following list:
Albuquerque Adams
Albuquerque Hartstein
Albuquerque Klein
... ...
Boston Andrews
Boston Bobrowski
Boston Heigham
... ...
For the best overall performance of direct path loads, you should presort the data based on the index that requires the most temporary segment space. For example, if the primary key is one numeric column, and the secondary key consists of three text columns, then you can minimize both sort time and storage requirements by pre-sorting on the secondary key.
To determine the index that requires the most storage space, use the following procedure:
Frequent data saves resulting from a small ROWS value adversely affect the performance of a direct path load. Because direct path loads can be many times faster than conventional loads, the value of ROWS should be considerably higher for a direct load than it would be for a conventional load.
During a data save, loading stops until all of SQL*Loader's buffers are successfully written. You should select the largest value for ROWS that is consistent with safety. It is a good idea to determine the average time to load a row by loading a few thousand rows. Then you can use that value to select a good value for ROWS.
For example, if you can load 20,000 rows per minute, and you do not want to repeat more than 10 minutes of work after an interruption, then set ROWS to be 200,000 (20,000 rows/minute * 10 minutes).
One way to speed a direct load dramatically is to minimize use of the redo log. There are three ways to do this. You can disable archiving, you can specify that the load is UNRECOVERABLE, or you can set the NOLOG attribute of the objects being loaded. This section discusses all methods.
If media recovery is disabled, direct path loads do not generate full image redo.
Use UNRECOVERABLE to save time and space in the redo log file. An UNRECOVERABLE load does not record loaded data in the redo log file, instead, it generates invalidation redo. Note that UNRECOVERABLE applies to all objects loaded during the load session (both data and index segments.)
Therefore, media recovery is disabled for the loaded table, although database changes by other users may continue to be logged.
Note: Because the data load is not logged, you may want to make a backup of the data after loading.
If media recovery becomes necessary on data that was loaded with the UNRECOVERABLE phrase, the data blocks that were loaded are marked as logically corrupted.
To recover the data, drop and re-create the data. It is a good idea to do backups immediately after the load to preserve the otherwise unrecoverable data.
By default, a direct path load is RECOVERABLE. See "Data Definition Language (DDL) Syntax" on page 5-4 for information on RECOVERABLE and UNRECOVERABLE.
If a data or index segment has the NOLOG attribute set, then full image redo logging is disabled for that segment (invalidation redo is generated.) Use of the NOLOG attribute allows a finer degree of control over the objects which are not logged.
For both the conventional path and the direct path, SQL*Loader maintains all existing indexes for a table.
Index maintenance can be avoided by using one of the following methods:
Avoiding index maintenance saves temporary storage while using the direct load method. Avoiding index maintenance minimizes the amount of space required during the load, for the following reasons:
Avoiding index maintenance is quite reasonable when the number of rows to be loaded is large compared to the size of the table. But if relatively few rows are added to a large table, then the time required to re-sort the indexes may be excessive. In such cases, it is usually better to make use of the conventional path, or use the SINGLEROW option.
With the conventional path, arrays of rows are inserted with standard SQL INSERT statements - integrity constraints and insert triggers are automatically applied. But when loading data with the direct path, some integrity constraints and all database triggers are disabled. This section discusses the implications of using direct path loads with respect to these features.
During a direct path load, some integrity constraints are automatically disabled. Others are not. For a description of the constraints, see the chapter "Maintaining Data Integrity" of Oracle8 Server Application Developer's Guide.
The constraints that remain in force are:
Not Null constraints are checked at column array build time. Any row that violates this constraint is rejected. Unique constraints are verified when indexes are rebuilt at the end of the load. The index will be left in Index Unusable state if a violation is detected. See "Indexes Left in Index Unusable State" on page 8-11.
The following constraints are disabled:
When the load completes, the integrity constraints will be re-enabled automatically if the REENABLE clause is specified. The syntax for this clause is as follows:
The optional keyword DISABLED_CONSTRAINTS is provided for readability. If the EXCEPTIONS clause is included, the table must already exist and, you must be able to insert into it. This table contains the ROWIDs of all rows that violated one of the integrity constraints. It also contains the name of the constraint that was violated. See Oracle8 Server SQL Reference for instructions on how to create an exceptions table.
If the REENABLE clause is not used, then the constraints must be re-enabled manually. All rows in the table are verified then. If Oracle finds any errors in the new data, error messages are produced. The names of violated constraints and the ROWIDs of the bad data are placed in an exceptions table, if one is specified. See ENABLE in Oracle8 Server SQL Reference.
The SQL*Loader log file describes the constraints that were disabled, the ones that were re-enabled and what error, if any, prevented re-enabling of each constraint. It also contains the name of the exceptions table specified for each loaded table.
Attention: As long as bad data remains in the table, the integrity constraint cannot be successfully re-enabled.
Suggestion: Because referential integrity must be reverified for the entire table, performance may be improved by using the conventional path, instead of the direct path, when a small number of rows are to be loaded into a very large table.
Table insert triggers are also disabled when a direct path load begins. After the rows are loaded and indexes rebuilt, any triggers that were disabled are automatically re-enabled. The log file lists all triggers that were disabled for the load. There should not be any errors re-enabling triggers.
Unlike integrity constraints, insert triggers are not reapplied to the whole table when they are enabled. As a result, insert triggers do not fire for any rows loaded on the direct path. When using the direct path, the application must ensure that any behavior associated with insert triggers is carried out for the new rows.
Applications commonly use insert triggers to implement integrity constraints. Most of these application insert triggers are simple enough that they can be replaced with Oracle's automatic integrity constraints.
Sometimes an insert trigger cannot be replaced with Oracle's automatic integrity constraints. For example, if an integrity check is implemented with a table lookup in an insert trigger, then automatic check constraints cannot be used, because the automatic constraints can only reference constants and columns in the current row. This section describes two methods for duplicating the effects of such a trigger.
Before either method can be used, the table must be prepared. Use the following general guidelines to prepare the table:
After following this procedure, all newly loaded rows are identified, making it possible to operate on the new data without affecting the old rows.
Generally, you can use a database update trigger to duplicate the effects of an insert trigger. This method is the simplest. It can be used whenever the insert trigger does not raise any exceptions.
Copy the trigger. Change all occurrences of "new.column_name" to "old.column_name".
Note: Depending on the behavior of the trigger, it may be necessary to have exclusive update access to the table during this operation, so that other users do not inadvertently apply the trigger to rows they modify.
If the insert trigger can raise an exception, then more work is required to duplicate its effects. Raising an exception would prevent the row from being inserted into the table. To duplicate that effect with an update trigger, it is necessary to mark the loaded row for deletion.
The "new data" column cannot be used for a delete flag, because an update trigger cannot modify the column(s) that caused it to fire. So another column must be added to the table. This column marks the row for deletion. A null value means the row is valid. Whenever the insert trigger would raise an exception, the update trigger can mark the row as invalid by setting a flag in the additional column.
Summary: When an insert trigger can raise an exception condition, its effects can be duplicated by an update trigger, provided:
The following procedure always works, but it is more complex to implement. It can be used when the insert trigger raises exceptions. It does not require a second additional column; and, because it does not replace the update trigger, and it can be used without exclusive access to the table.
SQL*Loader needs to acquire several locks on the table to be loaded to disable triggers and constraints. If a competing process is enabling triggers or constraints at the same time that SQL*Loader is trying to disable them for that table, then SQL*Loader may not be able to acquire exclusive access to the table.
SQL*Loader attempts to handle this situation as gracefully as possible. It attempts to re-enable disabled triggers and constraints before exiting. However, the same table-locking problem that made it impossible for SQL*Loader to continue may also have made it impossible for SQL*Loader to finish enabling triggers and constraints. In such cases, triggers and constraints will remain permanently disabled until they are manually enabled.
Although such a situation is unlikely, it is possible. The best way to prevent it is to make sure that no applications are running that could enable triggers or constraints for the table, while the direct load is in progress.
If a direct load is aborted due to failure to acquire the proper locks, carefully check the log. It will show every trigger and constraint that was disabled, and each attempt to re-enable them. Any triggers or constraints that were not re-enabled by SQL*Loader should be manually enabled with the ENABLE clause described in Oracle8 Server SQL Reference.
If triggers or integrity constraints pose a problem, but you want faster loading, you should consider using concurrent conventional path loads. That is, use multiple load sessions executing concurrently on a multiple-CPU system. Split the input datafiles into separate files on logical record boundaries, and then load each such input datafile with a conventional path load session. The resulting load has the following attributes:
This section discusses three basic models of concurrency which can be used to minimize the elapsed time required for data loading:
Using multiple conventional path load sessions executing concurrently is discussed in the previous section. This technique can be used to load the same or different objects concurrently with no restrictions.
Inter-segment concurrency can be used for concurrent loading of different objects. This technique can be applied for concurrent direct path loading of different tables, or to concurrent direct path loading of different partitions of the same table.
When direct path loading a single partition, the following items should be considered:
SQL*Loader permits multiple, concurrent sessions to perform a direct path load into the same table, or into the same partition of a partitioned table. Multiple SQL*Loader sessions improve the performance of a direct path load given the available resources on your system.
This method of data loading is enabled by setting both the DIRECT and the PARALLEL option to TRUE, and is often referred to as a "parallel direct path load."
It is important to realize that parallelism is user managed, setting the PARALLEL option to TRUE only allows multiple concurrent direct path load sessions.
The following restrictions are enforced on parallel direct path loads:
If a parallel direct path load is being applied to a single partition, it is best that the data is pre-partitioned (otherwise the overhead of record rejection due to a partition mismatch slows down the load.)
Each SQL*Loader session takes a different datafile as input. In all sessions executing a direct load on the same table, you must set PARALLEL to TRUE. The syntax is:
PARALLEL can be specified on the command line or in a parameter file. It can also be specified in the control file with the OPTIONS clause.
For example, to invoke three SQL*Loader direct path load sessions on the same table, you would execute the following commands at the operating system prompt:
SQLLOAD USERID=SCOTT/TIGER CONTROL=LOAD1.CTL DIRECT=TRUE PARALLEL=TRUE
SQLLOAD USERID=SCOTT/TIGER CONTROL=LOAD2.CTL DIRECT=TRUE PARALLEL=TRUE
SQLLOAD USERID=SCOTT/TIGER CONTROL=LOAD3.CTL DIRECT=TRUE PARALLEL=TRUE
The previous commands must be executed in separate sessions, or if permitted on your operating system, as separate background jobs. Note the use of multiple control files. This allows you to be flexible in specifying the files to use for the direct path load (see the example of one of the control files below).
Note: Indexes are not maintained during a parallel load. Any indexes must be (re)created or rebuilt manually after the load completes. You can use the parallel index creation or parallel index rebuild feature to speed the building of large indexes after a parallel load.
When you perform a PARALLEL load, SQL*Loader creates temporary segments for each concurrent session and then merges the segments upon completion. The segment created from the merge is then added to the existing segment in the database above the segment's high water mark. The last extent used of each segment for each loader session is trimmed of any free space before being combined with the other extents of the SQL*Loader session.
When using parallel direct path loads, options are available for specifying attributes of the temporary segment to be allocated by the loader.
It is recommended that each concurrent direct path load session use files located on different disks to allow for the maximum I/O throughput. Using the FILE keyword of the OPTIONS clause you can specify the filename of any valid datafile in the tablespace of the object (table or partition) being loaded. The following example illustrates a portion of one of the control files used for the SQL*Loader sessions in the previous example:
LOAD DATA
INFILE 'load1.dat'
INSERT INTO TABLE emp
OPTIONS(FILE='/dat/data1.dat')
(empno POSITION(01:04) INTEGER EXTERNAL NULLIF empno=BLANKS
...
You can specify the database file from which the temporary segments are allocated with the FILE keyword in the OPTIONS clause for each object (table or partition) in the control file. You can also specify the FILE parameter on the command line of each concurrent SQL*Loader session, but then it will globally apply to all objects being loaded with that session.
The FILE keyword in Oracle8 has the following restrictions for direct path parallel loads:
The STORAGE keyword can be used to specify the storage attributes of the temporary segment(s) allocated for a parallel direct path load. If the STORAGE keyword is not used, the storage attributes of the segment containing the object (table, partition) being loaded are used.
OPTIONS (STORAGE=(MINEXTENTS n1 MAXEXTENTS n2 INITIAL n3[K|M]
NEXT n4[K|M] PCTINCREASE n5)
For example, the following STORAGE clause could be used:
OPTIONS (STORAGE=(INITIAL 100M NEXT 100M PCTINCREASE 0))
The STORAGE keyword can only be used in the control file, and not on the command line. Use of the STORAGE keyword to specify anything other than PCTINCREASE of 0, and INITIAL or NEXT values is strongly discouraged (and may be silently ignored in the future.)
Constraints and triggers must be enabled manually after all data loading is complete.
This section gives a few guidelines which can help to improve the performance of a load. If you must use a certain feature to load your data, by all means do so. But if you have control over the format of the data to be loaded, here are a few hints which can be used to improve load performance: