Oracle8i Utilities Release 8.1.5 A67792-01 |
|
This chapter describes the SQL*Loader control file syntax. The following topics are included:
You use SQL*Loader's data definition language (DDL) to control how SQL*Loader performs a data load into your database. You can also use DDL to manipulate the data you are loading.
The SQL*Loader control file is a repository that contains the DDL instructions that you have created to control where SQL*Loader will find the data to load, how SQL*Loader expects that data to be formatted, how SQL*Loader will be configured (memory management, rejecting records, interrupted load handling, etc.) as it loads the data, and how it will manipulate the data being loaded. You create the SQL*Loader control file and its contents using a simple text editor like vi, or xemacs.
The rest of this chapter explains how to use DDL to achieve your required data load.
The SQL*Loader DDL diagrams (sometimes called "railroad diagrams") used in this chapter to illustrate syntax use standard SQL syntax notation. For more information about the syntax notation used in this chapter, see the PL/SQL User's Guide and Reference or the preface in the Oracle8i SQL Reference.
See Control File Basics for more information
The following diagrams of DDL syntax are shown with certain clauses collapsed (position_spec, into_table clause, etc.). These diagrams are expanded and explained in more detail in Expanded DDL Syntax.
Note: The characterset specified does not apply to data in the control file.
Important: The syntax above is specific to the Unix platform. Please see your Oracle operating system-specific documentation for the syntax required by your platform.
Note: full_fieldname
is the full name of a field specified using dot notation. If the field col2
is an attribute of a column object col1
, when referring to col2
in one of the directives, you must use the notation col1.col2
. The column_name
and the full_fieldname
referencing/naming the same entity can be different because column_name
never includes the full name of the entity (no dot notation).
Note: Only fields which are loaded from a LOBFILE can be terminated by EOF.
Note: Fields terminated by EOF cannot be enclosed.
Note: If the table uses primary key OIDs instead of system-generated OIDs, do not specify an OID clause.
Note: The column_name
and the fieldname
referencing/naming the same entity can be different because column_name
never includes the full name of the entity (no dot notation).
Notes:
Note: No field_condition can be based on fields in a secondary data file (SDF).
Note: The first argument to the BFILE directive contains the DIRECTORY OBJECT (the server_directory alias). The second argument contains the filename.
Note: Conventional path loading does piecing when necessary. During direct path loads, piecing is done automatically, therefore, it is unnecessary to specify the PIECED keyword.
Note: Conventional path loading does piecing when necessary. During direct path loads, piecing is done automatically, therefore, it is unnecessary to specify the PIECED keyword. Note also that you cannot specify sql_string
for LOB fields (regardless of whether LOBFILE_spec is specified).
Notes:
Note: field_list
cannot contain a collection_fld_spec
Notes: A col_obj_spec
nested within a VARRAY cannot contain a collection_fld_spec
.
The <column_name>
specified as part of the field_list
must be the same as the <column_name>
preceding the keyword VARRAY.
Notes:
Note: If a field is specified as the argument to the COUNT clause, that field must be mapped into the datafile data which is convertible to an integer (e.g. the string of characters "124").
A position specification (pos_spec) provides the starting location for a field and, optionally, the ending location. pos_spec syntax is:
The position must be surrounded by parentheses. The starting location can be specified as a column number, as * (next column), or *+n (next column plus an offset). The start and end locations can be separated with a colon (:) or a dash (-).
A field condition compares a named field or an area of the record to a specified value. When the condition evaluates to true, the specified function is performed. For example, a true condition might cause the NULLIF function to insert a NULL data value, or cause DEFAULTIF to insert a default value. field_condition syntax is:
char_string and hex_string can be enclosed in either single quotation marks or double quotation marks. hex_string is a string of hexadecimal digits, where each pair of digits corresponds to one byte in the field. The BLANKS keyword allows you to test a field to see if it consists entirely of blanks. BLANKS is required when you are loading delimited data and you cannot predict the length of the field, or when you use a multi-byte character set that has multiple blanks.
There must not be any spaces between the operator and the operands. For example:
(1)='x'
is legal, while
(1) = 'x'
generates an error.
The column name you specify in a field condition must be one of the columns defined for the input record. It must be specified with double quotation marks if its name is a reserved word. See Specifying Filenames and Objects Names for more details.
The precision of a numeric field is the number of digits it contains. The length of a numeric field is the number of byte positions on the record. The byte length of a ZONED decimal field is the same as its precision. However, the byte length of a (packed) DECIMAL field is (p+1)/2, rounded up, where p is the number's precision, because packed numbers contain two digits (or digit and sign) per byte.
The date mask specifies the format of the date value. For more information, see the DATE datatype.
delimiter_spec can specify a termination delimiter, enclosure delimiters, or a combination of the two, as shown below:
Note: Only fields which are loaded from a LOB file can be TERMINATED BY EOF.
Note: Fields TERMINATED BY EOF cannot also be enclosed.
For more information, see Specifying Delimiters.
The following sections describe the various DDL entries in the SQL*Loader control file and their function. All statements use the data definition language syntax described in the previous sections. The control file statements are presented in the approximate order they would appear in the control file.
Comments can appear anywhere in the command section of the file, but they should not appear within the data. Precede any comment with two hyphens. For example,
--This is a Comment
All text to the right of the double hyphen is ignored, until the end of the line. Appendix Case 3: Loading a Delimited, Free-Format File shows comments in a control file.
The OPTIONS statement is useful when you typically invoke a control file with the same set of options. The OPTION statement precedes the LOAD DATA statement.
The OPTIONS parameter allows you to specify runtime arguments in the control file, rather than on the command line. The following arguments can be specified using the OPTIONS parameter. These arguments are described in greater detail in Chapter 6, "SQL*Loader Command-Line Reference".
SKIP = n LOAD = n ERRORS = n ROWS = n BINDSIZE = n SILENT = {FEEDBACK | ERRORS | DISCARDS | ALL} DIRECT = {TRUE | FALSE} PARALLEL = {TRUE | FALSE}
For example:
OPTIONS (BINDSIZE=100000, SILENT=(ERRORS, FEEDBACK) )
Note: Values specified on the command line override values specified in the OPTIONS statement in the control file.
SQL*Loader follows the SQL standard for specifying object names (for example, table and column names). This section explains certain exceptions to that standard and how to specify database objects and filenames in the SQL*Loader control file that require special treatment. It also shows how the escape character is used in quoted strings.
SQL and SQL*Loader reserved words must be specified within double quotation marks. The reserved words most likely to be column names are:
COUNT DATA DATE FORMAT OPTIONS PART POSITION
So, if you had an inventory system with columns named PART, COUNT, and DATA, you would specify these column names within double quotation marks in your SQL*Loader control file. For example:
INTO TABLE inventory (partnum INTEGER, "PART" CHAR(15), "COUNT" INTEGER, "DATA" VARCHAR2(30))
See Appendix A, "SQL*Loader Reserved Words", for a complete list of reserved words.
You must use double quotation marks if the object name contains special characters other than those recognized by SQL ($, #, _), or if the name is case sensitive.
You must specify SQL strings within double quotation marks. The SQL string applies SQL operators to data fields. See Applying SQL Operators to Fields for more information.
A control file entry cannot specify a SQL string for any field in the control file that uses a BFILE, SID, OID, or REF directive.
SQL strings cannot be used with column objects or collections, or attributes of column objects or collections.
If you encounter problems when trying to specify a complete pathname, it may be due to an operating system-specific incompatibility caused by special characters in the specification. In many cases, specifying the pathname within single quotation marks prevents errors.
If not, please see your operating system-specific documentation for possible solutions.
In DDL syntax, you can place a double quotation mark inside a string delimited by double quotation marks by preceding it with the escape character, "\" (if the escape is allowed on your operating system). The same rule applies when single quotation marks are required in a string delimited by single quotation marks.
For example, homedir\data"norm\myfile
contains a double quotation mark. Preceding the double quote with a backslash indicates that the double quote is to be taken literally:
INFILE 'homedir\data\"norm\mydata'
You can also put the escape character itself into a string by entering it twice:
For example:
"so'\"far" or 'so\'"far' is parsed as so'"far "'so\\far'" or '\'so\\far\'' is parsed as 'so\far' "so\\\\far" or 'so\\\\far' is parsed as so\\far
Note: A double quote in the initial position cannot be escaped, therefore you should avoid creating strings with an initial quote.
There are two kinds of character strings in a SQL*Loader control file that are not portable between operating systems: filename and file processing option strings. When converting to a different operating system, these strings will likely need to be modified. All other strings in a SQL*Loader control file should be portable between operating systems.
If your operating system uses the backslash character to separate directories in a pathname and if the version of Oracle running on your operating system implements the backslash escape character for filenames and other non-portable strings, then you must specify double backslashes in your pathnames and use single quotation marks.
Additional Information: Please see your Oracle operating system-specific documentation for information about which escape characters are required or allowed.
The version of Oracle running on your operating system may not implement the escape character for non-portable strings. When the escape character is disallowed, a backslash is treated as a normal character, rather than as an escape character (although it is still usable in all other strings). Then pathnames such as:
INFILE 'topdir\mydir\myfile'
can be specified normally. Double backslashes are not needed.
Because the backslash is not recognized as an escape character, strings within single quotation marks cannot be embedded inside another string delimited by single quotation marks. This rule also holds for double quotation marks: A string within double quotation marks cannot be embedded inside another string delimited by double quotation marks.
If your data is contained in the control file itself and not in a separate datafile, you must include it following the load configuration specifications.
Specify the BEGINDATA keyword before the first data record. The syntax is:
BEGINDATA data
BEGINDATA is used in conjunction with the INFILE keyword, as described by specifying INFILE *. Case 1: Loading Variable-Length Data provides an example.
Notes:
You use the INFILE keyword to specify a datafile or datafiles fully followed by a file-processing options string. You can specify multiple files by using multiple INFILE keywords. You can also specify the datafile from the command line, using the DATA parameter described in Command-Line Keywords.
Note: the command-line parameter DATA overrides the INFILE keyword contained in the control file.
To specify a file that contains the data to be loaded, use the INFILE keyword, followed by the filename and optional processing options string. Remember that a filename specified on the command line overrides the first INFILE keyword in the control file. If no filename is specified, the filename defaults to the control filename with an extension or file type of DAT.
If the control file itself contains the data to be loaded, specify an asterisk (*). This specification is described in Identifying Data in the Control File with BEGINDATA .
Note: IDDN has been retained for compatibility with DB2.
where:
INFILE or INDDN |
(Use INDDN when DB2 compatibility is required.) This keyword specifies that a datafile specification follows. |
filename |
Name of the file containing the data. Any spaces or punctuation marks in the filename must be enclosed in single quotation marks. See Specifying Filenames and Objects Names. |
* |
If your data is in the control file itself, use an asterisk instead of the filename. If you have data in the control file as well as datafiles, you must specify the asterisk first in order for the data to be read. |
processing_options |
This is the file-processing options string. It specifies the datafile format. It also optimizes datafile reads. See Specifying Datafile Format and Buffering. |
To load data from multiple datafiles in one SQL*Loader run, use an INFILE statement for each datafile. Datafiles need not have the same file processing options, although the layout of the records must be identical. For example, two files could be specified with completely different file processing options strings, and a third could consist of data in the control file.
You can also specify a separate discard file and bad file for each datafile. However, the separate bad files and discard files must be declared after each datafile name. For example, the following excerpt from a control file specifies four datafiles with separate bad and discard files:
INFILE mydat1.dat BADFILE mydat1.bad DISCARDFILE mydat1.dis INFILE mydat2.dat INFILE mydat3.dat DISCARDFILE mydat3.dis INFILE mydat4.dat DISCARDMAX 10 0
mydat3.dis
) is created, as needed.
mydat4.dsc
, if it is needed.
INFILE *
INFILE WHIRL
INFILE 'c:/topdir/subdir/datafile.dat'
Note: Filenames that include spaces or punctuation marks must be enclosed in single quotation marks. For more details on filename specification, see Specifying Filenames and Objects Names.
The READBUFFERS keyword controls memory usage by SQL*Loader. This keyword is used for direct path loads only. For more information, Maximizing Performance of Direct Path Loads.
When configuring SQL*Loader, you can specify an operating system-dependent file processing options string in the control file to control file processing. You use this string to specify file format and buffering.
Additional Information: For details on the syntax of the file processing options string, see your Oracle operating system-specific documentation.
For example, suppose that your operating system has the following option-string syntax:
where RECSIZE is the size of a fixed-length record, and BUFFERS is the number of buffers to use for asynchronous I/O.
To declare a file named MYDATA.DAT as a file that contains 80-byte records and instruct SQL*Loader to use eight I/O buffers, using this syntax you would use the following control file entry:
INFILE 'mydata.dat' "RECSIZE 80 BUFFERS 8"
Note: This example uses the recommended convention of single quotation marks for filenames and double quotation marks for everything else. See Specifying Filenames and Objects Names for more details.
When SQL*Loader executes, it can create a file called a bad file or reject file in which it places records that were rejected because of formatting errors or because they caused Oracle errors. If you have specified that a bad file is to be created, the following applies:
Additional Information: On some systems a new version of the file is created if a file with the same name already exists. See your Oracle operating system-specific documentation to find out if this is the case on your system.
To specify the name of this file, use the BADFILE keyword (or BADDN for DB2 compatibility), followed by the bad file filename. If you do not specify a name for the bad file, the name defaults to the name of the datafile with an extension or file type of BAD. You can also specify the bad file from the command line with the BAD parameter described in Command-Line Keywords.
A filename specified on the command line is associated with the first INFILE or INDDN clause in the control file, overriding any bad file that may have been specified as part of that clause.
The bad file is created in the same record and file format as the datafile so that the data can be reloaded after corrections. The syntax is
where:
BADFILE or BADDN |
(Use BADDN when DB2 compatibility is required.) This keyword specifies that a filename for the badfile follows. |
bad_filename |
Any valid filename specification for your platform. Any spaces or punctuation marks in the filename must be enclosed in single quotation marks. See Specifying Filenames and Objects Names. |
A bad file with filename UGH and default file extension or file type of .bad:
BADFILE UGH
A bad file with filename BAD0001 and file extension or file type of .rej:
BADFILE BAD0001.REJ BADFILE '/REJECT_DIR/BAD0001.REJ'
A record is rejected if it meets either of the following conditions:
If the data can be evaluated according to the WHEN-clause criteria (even with unbalanced delimiters) then it is either inserted or rejected.
If a record is rejected on insert, then no part of that record is inserted into any table. For example, if data in a record is to be inserted into multiple tables, and most of the inserts succeed, but one insert fails, then all the inserts from that record are rolled back. The record is then written to the bad file, where it can be corrected and reloaded. Previous inserts from records without errors are not affected.
The log file indicates the Oracle error for each rejected record. Case 4: Loading Combined Physical Records demonstrates rejected records.
Note: During a multi-table load, SQL*Loader ensures that, if a row is rejected from one table, it is also rejected from all other tables. This is to ensure that the row can be repaired in the bad file and reloaded to all tables consistently. Also, if a row is loaded into one table, it should be loaded into all other tables which don't filter it out. Otherwise, reloading a fixed version of the row from the bad file could cause the data to be loaded into some tables twice.
Therefore, when SQL*Loader encounters the maximum number of errors allowed for a multi-table load, it continues loading rows to ensure that valid rows loaded into previous tables are either loaded into all tables or filtered out of all tables.
Data from LOB files or secondary data files are not written to a bad file when there are rejected rows. If there is an error loading a LOB, the row is not rejected, rather the LOB field is left empty (not NULL with a length of zero (0) bytes).
During SQL*Loader execution, it can create a discard file for records that do not meet any of the loading criteria. The records contained in this file are called discarded records. Discarded records do not satisfy any of the WHEN clauses specified in the control file. These records differ from rejected records. Discarded records do not necessarily have any bad data. No insert is attempted on a discarded record.
A discard file is created according to the following rules:
To create a discard file, use any of the following syntax:
In a Control File | On the Command Line | |
DISCARDFILE filename |
|
DISCARD |
DISCARDDN filename (DB2) |
|
DISCARDMAX |
DISCARDS |
|
|
DISCARDMAX |
|
|
Note that you can specify the discard file directly with a parameter specifying its name, or indirectly by specifying the maximum number of discards.
To specify the name of the file, use the DISCARDFILE or DISCARDDN (for DB2-compatibility) keyword, followed by the filename.
where:
DISCARDFILE or DISCARDDN |
(Use DISCARDDN when DB2 compatibility is required.) This keyword specifies that a discard filename follows. |
discard_filename |
Any valid filename specification for you platform. Any spaces or punctuation marks in the filename must be enclosed in single quotation marks. See Specifying Filenames and Objects Names. |
The default filename is the name of the datafile, and the default file extension or file type is DSC. A discard filename specified on the command line overrides one specified in the control file. If a discard file with that name already exists, it is either overwritten or a new version is created, depending on your operating system.
The discard file is created with the same record and file format as the datafile. So it can easily be used for subsequent loads with the existing control file, after changing the WHEN clauses or editing the data.
A discard file with filename CIRCULAR and default file extension or file type of .dsc
:
DISCARDFILE CIRCULAR
A discard file named notappl
with the file extension or file type of .may
:
DISCARDFILE NOTAPPL.MAY
A full path to the discard file forget.me
:
DISCARDFILE '/DISCARD_DIR/FORGET.ME'
If there is no INTO TABLE keyword specified for a record, the record is discarded. This situation occurs when every INTO TABLE keyword in the SQL*Loader control file has a WHEN clause; and either the record fails to match any of them or all fields are null.
No records are discarded if an INTO TABLE keyword is specified without a WHEN clause. An attempt is made to insert every record into such a table. So records may be rejected, but none are discarded.
Case 4: Loading Combined Physical Records provides an example of using a discard file.
Data from LOB files or secondary data files are not written to a discard file when there are discarded rows.
You can limit the number of records to be discarded for each datafile:
where n must be an integer. When the discard limit is reached, processing of the datafile terminates and continues with the next datafile, if one exists.
You can specify a different number of discards for each datafile. Alternatively, if the number of discards is only specified once, then the maximum number of discards specified applies to all files.
If you specify a maximum number of discards, but no discard filename, SQL*Loader creates a discard file with the default filename and file extension or file type. Case 4: Loading Combined Physical Records provides an example.
You can specify the discard file from the command line, with the parameter DISCARDFILE described in Command-Line Keywords.
A filename specified on the command line overrides any bad file that you may have specified in the control file.
SQL*Loader supports different character encoding schemes (called character sets, or code pages). SQL*Loader uses Oracle's NLS (National Language Support) features to handle the various single-byte and multi-byte character encoding schemes available today. See the Oracle8i National Language Support Guide for information about supported character encoding schemes. The following sections provide a brief introduction to some of the supported schemes.
Multi-byte character sets support Asian languages. Data can be loaded in multi-byte format, and database objects (fields, tables, and so on) can be specified with multi-byte characters. In the control file, comments and object names may also use multi-byte characters.
SQL*Loader has the capacity to convert data from the datafile character set to the database character set, when they differ.
When using a conventional path load, data is converted into the session character set specified by the NLS_LANG initialization parameter for that session. The data is then loaded using SQL INSERT statements. The session character set is the character set supported by your terminal.
During a direct path load, data converts directly into the database character set. The direct path load method, therefore, allows data in a character set that is not supported by your terminal to be loaded.
Note: When data conversion is required, it is essential that the target character set contains a representation of all characters that exist in the data. Otherwise, characters that have no equivalent in the target character set are converted to a default character, with consequent loss of data.
When using the direct path, load method the database character set should be a superset of, or equivalent to, the datafile character sets. Similarly, during a conventional path load, the session character set should be a superset of, or equivalent to, the datafile character sets.
The character set used in each input file is specified with the CHARACTERSET keyword.
You use the CHARACTERSET keyword to specify to SQL*Loader which character set is used in each datafile. Different datafiles can be specified with different character sets. However, only one character set can be specified for each datafile.
Using the CHARACTERSET keyword causes character data to be automatically converted when it is loaded into the database. Only CHAR, DATE, and numeric EXTERNAL fields are affected. If the CHARACTERSET keyword is not specified, then no conversion occurs.
The CHARACTERSET syntax is:
CHARACTERSET character_set_spec
where character_set_spec is the acronym used by Oracle to refer to your particular encoding scheme.
Additional Information: For more information on supported character sets, code pages, and the NLS_LANG parameter, see the Oracle8i National Language Support Guide.
The SQL*Loader control file itself is assumed to be in the character set specified for your session by the NLS_LANG parameter. However, delimiters and comparison clause values must be specified to match the character set in use in the datafile. To ensure that the specifications are correct, it may be preferable to specify hexadecimal strings, rather than character string values.
Any data included after the BEGINDATA statement is also assumed to be in the character set specified for your session by the NLS_LANG parameter. Data that uses a different character set must be in a separate file.
You can specify one of the following methods for loading tables:
If the tables you are loading into are empty, use the INSERT option.
If the tables you are loading into already contain data, you have three options:
Warning: When the REPLACE or TRUNCATE keyword is specified, the entire table is replaced, not just individual rows. After the rows are successfully deleted, a commit is issued. You cannot recover the data that was in the table before the load, unless it was saved with Export or a comparable utility.
Note: This section corresponds to the DB2 keyword RESUME; users of DB2 should also refer to the description of RESUME in Appendix B, "DB2/DXT User Notes".
If data already exists in the table, SQL*Loader appends the new rows to it. If data doesn't already exist, the new rows are simply loaded. You must have SELECT privilege to use the APPEND option. Case 3: Loading a Delimited, Free-Format File provides an example.
All rows in the table are deleted and the new data is loaded. The table must be in your schema, or you must have DELETE privilege on the table. Case 4: Loading Combined Physical Records provides an example.
The row deletes cause any delete triggers defined on the table to fire. If DELETE CASCADE has been specified for the table, then the cascaded deletes are carried out, as well. For more information on cascaded deletes, see the "Data Integrity" chapter of Oracle8i Concepts.
The REPLACE method is a table replacement, not a replacement of individual rows. SQL*Loader does not update existing records, even if they have null columns. To update existing rows, use the following procedure:
For more information, see the "UPDATE" statement in Oracle8i SQL Reference.
Using this method, SQL*Loader uses the SQL TRUNCATE command to achieve the best possible performance. For the TRUNCATE command to operate, the table's referential integrity constraints must first be disabled. If they have not been disabled, SQL*Loader returns an error.
Once the integrity constraints have been disabled, DELETE CASCADE is no longer defined for the table. If the DELETE CASCADE functionality is needed, then the contents of the table must be manually deleted before the load begins.
The table must be in your schema, or you must have the DELETE ANY TABLE privilege.
Notes:
Unlike the SQL TRUNCATE option, this method re-uses a table's extents.
INSERT is SQL*Loader's default method. It requires the table to be empty before loading. SQL*Loader terminates with an error if the table contains rows. Case 1: Loading Variable-Length Data provides an example.
If SQL*Loader runs out of space for data rows or index entries, the load is discontinued. (For example, the table might reach its maximum number of extents.) Discontinued loads can be continued after more space is made available.
When a load is discontinued, any data already loaded remains in the tables, and the tables are left in a valid state. If the conventional path is used, all indexes are left in a valid state.
If the direct path load method is used, any indexes that run out of space are left in direct load state. They must be dropped before the load can continue. Other indexes are valid provided no other errors occurred. (See Indexes Left in Index Unusable State for other reasons why an index might be left in direct load state.)
SQL*Loader's log file tells you the state of the tables and indexes and the number of logical records already read from the input datafile. Use this information to resume the load where it left off.
Before continuing a direct path load, inspect the SQL*Loader log file to make sure that no indexes are in direct load state. Any indexes that are left in direct load state must be dropped before continuing the load. The indexes can then be re-created either before continuing or after the load completes.
To continue a discontinued direct or conventional path load involving only one table, specify the number of logical records to skip with the command-line parameter SKIP. If the SQL*Loader log file says that 345 records were previously read, then the command to continue would look like this:
SQLLDR USERID=scott/tiger CONTROL=FAST1.CTL DIRECT=TRUE SKIP=345
It is not possible for multiple tables in a conventional path load to become unsynchronized. So a multiple table conventional path load can also be continued with the command-line parameter SKIP. Use the same procedure that you would use for single-table loads, as described in the preceding paragraph.
If SQL*Loader cannot finish a multiple-table direct path load, the number of logical records processed could be different for each table. If so, the tables are not synchronized and continuing the load is slightly more complex.
To continue a discontinued direct path load involving multiple tables, inspect the SQL*Loader log file to find out how many records were loaded into each table. If the numbers are the same, you can use the previously described simple continuation.
If the numbers are different, use the CONTINUE_LOAD keyword and specify SKIP at the table level, instead of at the load level. These statements exist to handle unsynchronized interrupted loads.
Instead of specifying:
LOAD DATA...
at the start of the control file, specify:
Then, for each INTO TABLE clause, specify the number of logical records to skip for that table using the SKIP keyword:
... INTO TABLE emp SKIP 2345 ... INTO TABLE dept SKIP 514 ...
The CONTINUE_LOAD keyword is only needed after a direct load failure because multiple table loads cannot become unsynchronized when using the conventional path.
If you specify CONTINUE_LOAD, you cannot use the command-line parameter SKIP. You must use the table-level SKIP clause. If you specify LOAD, you can optionally use the command-line parameter SKIP, but you cannot use the table-level SKIP clause.
Since Oracle8i supports user-defined record sizes larger than 64k (see READSIZE (read buffer)), the need to fragment logical records into physical records is reduced. However, there may still be situations in which you may want to do so.
You can create one logical record from multiple physical records using one of the following two clauses, depending on your data:
CONCATENATE CONTINUEIF
CONCATENATE is appropriate in the simplest case, when SQL*Loader should always add the same number of physical records to form one logical record.
The syntax is:
CONCATENATE n
where n indicates the number of physical records to combine.
If the number of physical records to be continued varies, then CONTINUEIF must be used. The keyword CONTINUEIF is followed by a condition that is evaluated for each physical record, as it is read. For example, two records might be combined if there were a pound sign (#) in character position 80 of the first record. If any other character were there, the second record would not be added to the first. The full syntax for CONTINUEIF adds even more flexibility:
where:
Note: The positions in the CONTINUEIF clause refer to positions in each physical record. This is the only time you refer to character positions in physical records. All other references are to logical records.
For CONTINUEIF THIS and CONTINUEIF NEXT, the continuation field is removed from all physical records before the logical record is assembled. This allows data values to span the records with no extra characters (continuation characters) in the middle. Two examples showing CONTINUEIF THIS and CONTINUEIF NEXT follow:
CONTINUEIF THIS CONTINUEIF NEXT (1:2) = '%%' (1:2) ='%%'
Assume physical data records 12 characters long and that a period means a space:
%%aaaaaaaa.... ..aaaaaaaa.... %%bbbbbbbb.... %%bbbbbbbb.... ..cccccccc.... %%cccccccc.... %%dddddddddd.. ..dddddddddd.. %%eeeeeeeeee.. %%eeeeeeeeee.. ..ffffffffff.. %%ffffffffff..
The logical records would be the same in each case:
aaaaaaaa....bbbbbbbb....cccccccc.... dddddddddd..eeeeeeeeee..ffffffffff..
Notes:
In the first example, you specify that if the current physical record (record1) has an asterisk in column 1. Then the next physical record (record2) should be appended to it. If record2 also has an asterisk in column 1, then record3 is appended also.
If record2 does not have an asterisk in column 1, then it is still appended to record1, but record3 begins a new logical record.
CONTINUEIF THIS (1) = "*"
In the next example, you specify that if the current physical record (record1) has a comma in the last non-blank data column. Then the next physical record (record2) should be appended to it. If a record does not have a comma in the last column, it is the last physical record of the current logical record.
CONTINUEIF LAST = ","
In the last example, you specify that if the next physical record (record2) has a "10" in columns 7 and 8. Then it should be appended to the preceding physical record (record1). If a record does not have a "10" in columns 7 and 8, then it begins a new logical record.
CONTINUEIF NEXT (7:8) = '10'
Case 4: Loading Combined Physical Records provides an example of the CONTINUEIF clause.
This section describes the way in which you specify:
The INTO TABLE keyword of the LOAD DATA statement allows you to identify tables, fields, and datatypes. It defines the relationship between records in the datafile and tables in the database. The specification of fields and datatypes is described in later sections.
Among its many functions, the INTO TABLE keyword allows you to specify the table into which you load data. To load multiple tables, you include one INTO TABLE clause for each table you wish to load.
To begin an INTO TABLE clause, use the keywords INTO TABLE, followed by the name of the Oracle table that is to receive the data.
The table must already exist. The table name should be enclosed in double quotation marks if it is the same as any SQL or SQL*Loader keyword, if it contains any special characters, or if it is case sensitive.
INTO TABLE SCOTT."COMMENT" INTO TABLE SCOTT."Comment" INTO TABLE SCOTT."-COMMENT"
The user running SQL*Loader should have INSERT privileges on the table. Otherwise, the table name should be prefixed by the username of the owner as follows:
INTO TABLE SOPHIA.EMP
The INTO TABLE clause may include a table-specific loading method (INSERT, APPEND, REPLACE, or TRUNCATE) that applies only to that table. Specifying one of these methods within the INTO TABLE clause overrides the global table-loading method. The global table-loading method is INSERT, by default, unless a different method was specified before any INTO TABLE clauses. For more information on these options, see Loading into Empty and Non-Empty Tables.
The OPTIONS keyword can be specified for individual tables in a parallel load. (It is only valid for a parallel load.) For more information, see Parallel Data Loading Models.
You can choose to load or discard a logical record by using the WHEN clause to test a condition in the record.
The WHEN clause appears after the table name and is followed by one or more field conditions.
For example, the following clause indicates that any record with the value "q" in the fifth column position should be loaded:
WHEN (5) = 'q'
A WHEN clause can contain several comparisons provided each is preceded by AND. Parentheses are optional, but should be used for clarity with multiple comparisons joined by AND. For example
WHEN (DEPTNO = '10') AND (JOB = 'SALES')
To evaluate the WHEN clause, SQL*Loader first determines the values of all the fields in the record. Then the WHEN clause is evaluated. A row is inserted into the table only if the WHEN clause is true.
Field conditions are discussed in detail in Specifying Field Conditions. Case 5: Loading Data into Multiple Tables provides an example of the WHEN clause.
If a WHEN directive fails on a record, that record is discarded (skipped). Note also that, the skipped record is assumed to be contained completely in the main datafile, therefore, a secondary data file will not be affected if present.
If all data fields are terminated similarly in the datafile, you can use the FIELDS clause to indicate the default delimiters. The syntax is:
Note: Terminators are strings not limited to a single character. Also, TERMINATED BY EOF applies only to loading LOBs from SSDFs.
Note: Enclosure strings do not have to be a single character.
You can override the delimiter for any given column by specifying it after the column name. Case 3: Loading a Delimited, Free-Format File provides an example. See Specifying Delimiters for more information on delimiter specification.
When the control file definition specifies more fields for a record than are present in the record, SQL*Loader must determine whether the remaining (specified) columns should be considered null or whether an error should be generated.
If the control file definition explicitly states that a field's starting position is beyond the end of the logical record, then SQL*Loader always defines the field as null. If a field is defined with a relative position (such as DNAME and LOC in the example below), and the record ends before the field is found; then SQL*Loader could either treat the field as null or generate an error. SQL*Loader uses the presence or absence of the TRAILING NULLCOLS clause to determine the course of action.
TRAILING NULLCOLS tells SQL*Loader to treat any relatively positioned columns that are not present in the record as null columns.
For example, if the following data
10 Accounting
is read with the following control file
INTO TABLE dept TRAILING NULLCOLS ( deptno CHAR TERMINATED BY " ", dname CHAR TERMINATED BY WHITESPACE, loc CHAR TERMINATED BY WHITESPACE )
and the record ends after DNAME. The remaining LOC field is set to null. Without the TRAILING NULLCOLS clause, an error would be generated due to missing data.
Case 7: Extracting Data from a Formatted Report provides an example of TRAILING NULLCOLS.
This section describes the SQL*Loader options that control how index entries are created.
The SORTED INDEXES option applies to direct path loads. It tells SQL*Loader that the incoming data has already been sorted on the specified indexes, allowing SQL*Loader to optimize performance. Syntax for this feature is given in High-Level Syntax Diagrams. Further details are in SORTED INDEXES Statement.
The SINGLEROW option is intended for use during a direct path load with APPEND on systems with limited memory, or when loading a small number of rows into a large table. This option inserts each index entry directly into the index, one row at a time.
By default, SQL*Loader does not use SINGLEROW when APPENDing rows to a table. Instead, index entries are put into a separate, temporary storage area and merged with the original index at the end of the load. This method achieves better performance and produces an optimal index, but it requires extra storage space. During the merge, the original index, the new index, and the space for new entries all simultaneously occupy storage space.
With the SINGLEROW option, storage space is not required for new index entries or for a new index. The resulting index may not be as optimal as a freshly sorted one, but it takes less space to produce. It also takes more time, since additional UNDO information is generated for each index insert. This option is suggested for use when:
A field condition is a statement about a field in a logical record that evaluates as true or false. It is used in the NULLIF and DEFAULTIF clauses, as well as in the WHEN clause.
A field condition is similar to the condition in the CONTINUEIF clause, with two important differences. First, positions in the field condition refer to the logical record, not to the physical record. Second, you may specify either a position in the logical record or the name of a field that is being loaded.
where:
The BLANKS keyword makes it possible to determine easily if a field of unknown length is blank.
For example, use the following clause to load a blank field as null:
full_field_name ... NULLIF column_name=BLANKS
The BLANKS keyword only recognizes blanks, not tabs. It can be used in place of a literal string in any field comparison. The condition is TRUE whenever the column is entirely blank.
The BLANKS keyword also works for fixed-length fields. Using it is the same as specifying an appropriately-sized literal string of blanks. For example, the following specifications are equivalent:
fixed_field CHAR(2) NULLIF (fixed_field)=BLANKS fixed_field CHAR(2) NULLIF (fixed_field)=" "
Note: There can be more than one "blank" in a multi-byte character set. It is a good idea to use the BLANKS keyword with these character sets instead of specifying a string of blank characters.
The character string will match only a specific sequence of blank characters, while the BLANKS keyword will match combinations of different blank characters. For more information on multi-byte character sets, see Multi-Byte (Asian) Character Sets.
When a data field is compared to a shorter literal string, the string is padded for the comparison; character strings are padded with blanks; for example:
NULLIF (1:4)="_"
compares the data in position 1:4 with 4 blanks. If position 1:4 contains 4 blanks, then the clause evaluates as true.
Hexadecimal strings are padded with hexadecimal zeroes. The clause
NULLIF (1:4)=X'FF'
compares position 1:4 to hex 'FF000000'.
You may load any number of a table's columns. Columns defined in the database, but not specified in the control file, are assigned null values (this is the proper way to insert null values).
A column specification is the name of the column, followed by a specification for the value to be put in that column. The list of columns is enclosed by parentheses and separated with commas as follows:
( columnspec, columnspec, ... )
Each column name must correspond to a column of the table named in the INTO TABLE clause. A column name must be enclosed in quotation marks if it is a SQL or SQL*Loader reserved word, contains special characters, or is case sensitive.
If the value is to be generated by SQL*Loader, the specification includes the keyword RECNUM, the SEQUENCE function, or the keyword CONSTANT. See Generating Data.
If the column's value is read from the datafile, the data field that contains the column's value is specified. In this case, the column specification includes a column name that identifies a column in the database table, and a field specification that describes a field in a data record. The field specification includes position, datatype, null restrictions, and defaults.
It is not necessary to specify all attributes when loading column objects. Any missing attributes will be set to NULL.
Filler fields have names but they are not loaded into the table. However, filler fields can be used as arguments to init_specs (foe example, NULLIF and DEFAULTIF) as well as to directives (for example, SID, OID, REF, BFILE). Also, filler fields can occur anyplace in the data file. They can be inside of the field list for an object or inside the definition of a VARRAY. See New SQL*Loader DDL Behavior and Restrictions for more information on filler fields and their use.
The following an sample filler field specification:
field_1_count FILLER char, field_1 varray count(field_1_count) ( filler_field1 char{2}, field_1 column object ( attr1 char(2), filler_field2 char(2), attr2 char(2), ) filler_field3 char(3), ) filler_field4 char(6)
A field's datatype specification tells SQL*Loader how to interpret the data in the field. For example, a datatype of INTEGER specifies binary data, while INTEGER EXTERNAL specifies character data that represents a number. A CHAR field, however, can contain any character data.
You may specify one datatype for each field; if unspecified, CHAR is assumed.
SQL*Loader Datatypes describes how SQL*Loader datatypes are converted into Oracle datatypes and gives detailed information on each SQL*Loader's datatype.
Before the datatype is specified, the field's position must be specified.
To load data from the datafile SQL*Loader must know a field's location and its length. To specify a field's position in the logical record, use the POSITION keyword in the column specification. The position may either be stated explicitly or relative to the preceding field. Arguments to POSITION must be enclosed in parentheses, as follows:
where:
You may omit POSITION entirely. If you do, the position specification for the data field is the same as if POSITION(*) had been used.
For example
ENAME POSITION (1:20) CHAR EMPNO POSITION (22-26) INTEGER EXTERNAL ALLOW POSITION (*+2) INTEGER EXTERNAL TERMINATED BY "/"
Column ENAME is character data in positions 1 through 20, followed by column EMPNO, which is presumably numeric data in columns 22 through 27. Column ALLOW is offset from the end of EMPNO by +2. So it starts in column 29 and continues until a slash is encountered.
When you are determining field positions, be alert for TABs in the datafile. The following situation is highly likely when using SQL*Loader's advanced SQL string capabilities to load data from a formatted report:
These kinds of errors occur when the data contains TABs. When printed, each TAB expands to consume several columns on the paper. In the datafile, however, each TAB is still only one character. As a result, when SQL*Loader reads the datafile, the POSITION specifications are wrong.
To fix the problem, inspect the datafile for tabs and adjust the POSITION specifications, or else use delimited fields.
The use of delimiters to specify relative positioning of fields is discussed in detail in Specifying Delimiters. Especially note how the delimiter WHITESPACE can be used.
In a multiple table load, you specify multiple INTO TABLE clauses. When you specify POSITION(*) for the first column of the first table, the position is calculated relative to the beginning of the logical record. When you specify POSITION(*) for the first column of subsequent tables, the position is calculated relative to the last column of the last table loaded.
Thus, when a subsequent INTO TABLE clause begins, the position is not set to the beginning of the logical record automatically. This allows multiple INTO TABLE clauses to process different parts of the same physical record. For an example, see the second example in Extracting Multiple Logical Records.
A logical record may contain data for one of two tables, but not both. In this case, you would reset POSITION. Instead of omitting the position specification or using POSITION(*+n) for the first field in the INTO TABLE clause, use POSITION(1) or POSITION(n).
SITEID POSITION (*) SMALLINT SITELOC POSITION (*) INTEGER
If these were the first two column specifications, SITEID would begin in column1, and SITELOC would begin in the column immediately following.
ENAME POSITION (1:20) CHAR EMPNO POSITION (22-26) INTEGER EXTERNAL ALLOW POSITION (*+2) INTEGER EXTERNAL TERMINATED BY "/"
Column ENAME is character data in positions 1 through 20, followed by column EMPNO which is presumably numeric data in columns 22 through 26. Column ALLOW is offset from the end of EMPNO by +2, so it starts in column 28 and continues until a slash is encountered.
Multiple INTO TABLE statements allow you to:
In the first case, it is common for the INTO TABLE statements to refer to the same table. This section illustrates the different ways to use multiple INTO TABLE statements and shows you how to use the POSITION keyword.
Note: A key point when using multiple INTO TABLE statements is that field scanning continues from where it left off when a new INTO TABLE statement is processed. The remainder of this section details important ways to make use of that behavior. It also describes alternative ways using fixed field locations or the POSITION keyword.
Some data storage and transfer media have fixed-length physical records. When the data records are short, more than one can be stored in a single, physical record to use the storage space efficiently.
In this example, SQL*Loader treats a single physical record in the input file as two logical records and uses two INTO TABLE clauses to load the data into the EMP table. For example, if the data looks like
1119 Smith 1120 Yvonne 1121 Albert 1130 Thomas
then the following control file extracts the logical records:
INTO TABLE emp (empno POSITION(1:4) INTEGER EXTERNAL, ename POSITION(6:15) CHAR) INTO TABLE emp (empno POSITION(17:20) INTEGER EXTERNAL, ename POSITION(21:30) CHAR)
The same record could be loaded with a different specification. The following control file uses relative positioning instead of fixed positioning. It specifies that each field is delimited by a single blank (" "), or with an undetermined number of blanks and tabs (WHITESPACE):
INTO TABLE emp (empno INTEGER EXTERNAL TERMINATED BY " ", ename CHAR TERMINATED BY WHITESPACE) INTO TABLE emp (empno INTEGER EXTERNAL TERMINATED BY " ", ename CHAR) TERMINATED BY WHITESPACE)
The important point in this example is that the second EMPNO field is found immediately after the first ENAME, although it is in a separate INTO TABLE clause. Field scanning does not start over from the beginning of the record for a new INTO TABLE clause. Instead, scanning continues where it left off.
To force record scanning to start in a specific location, you use the POSITION keyword. That mechanism is described next.
A single datafile might contain records in a variety of formats. Consider the following data, in which EMP and DEPT records are intermixed:
1 50 Manufacturing -- DEPT record 2 1119 Smith 50 -- EMP record 2 1120 Snyder 50 1 60 Shipping 2 1121 Stevens 60
A record ID field distinguishes between the two formats. Department records have a "1" in the first column, while employee records have a "2". The following control file uses exact positioning to load this data:
INTO TABLE dept WHEN recid = 1 (recid POSITION(1:1) INTEGER EXTERNAL, deptno POSITION(3:4) INTEGER EXTERNAL, ename POSITION(8:21) CHAR) INTO TABLE emp WHEN recid <> 1 (recid POSITION(1:1) INTEGER EXTERNAL, empno POSITION(3:6) INTEGER EXTERNAL, ename POSITION(8:17) CHAR, deptno POSITION(19:20) INTEGER EXTERNAL)
Again, the records in the previous example could also be loaded as delimited data. In this case, however, it is necessary to use the POSITION keyword. The following control file could be used:
INTO TABLE dept WHEN recid = 1 (recid INTEGER EXTERNAL TERMINATED BY WHITESPACE, deptno INTEGER EXTERNAL TERMINATED BY WHITESPACE, dname CHAR TERMINATED BY WHITESPACE) INTO TABLE emp WHEN recid <> 1 (recid POSITION(1) INTEGER EXTERNAL TERMINATED BY ' ', empno INTEGER EXTERNAL TERMINATED BY ' ' ename CHAR TERMINATED BY WHITESPACE, deptno INTEGER EXTERNAL TERMINATED BY ' ')
The POSITION keyword in the second INTO TABLE clause is necessary to load this data correctly. This keyword causes field scanning to start over at column 1 when checking for data that matches the second format. Without it, SQL*Loader would look for the RECID field after DNAME.
By using the POSITION clause with multiple INTO TABLE clauses, data from a single record can be loaded into multiple normalized tables. See Case 5: Loading Data into Multiple Tables.
Multiple INTO TABLE clauses allow you to extract multiple logical records from a single input record and recognize different record formats in the same file.
For delimited data, proper use of the POSITION keyword is essential for achieving the expected results.
When the POSITION keyword is not used, multiple INTO TABLE clauses process different parts of the same (delimited data) input record, allowing multiple tables to be loaded from one record. When the POSITION keyword is used, multiple INTO TABLE clauses can process the same record in different ways, allowing multiple formats to be recognized in one input file.
The functions described in this section provide the means for SQL*Loader to generate the data stored in the database row, rather than reading it from a datafile. The following functions are described:
It is possible to use SQL*Loader to generate data by specifying only sequences, record numbers, system dates, and constants as field specifications.
SQL*Loader inserts as many rows as are specified by the LOAD keyword. The LOAD keyword is required in this situation. The SKIP keyword is not permitted.
SQL*Loader is optimized for this case. Whenever SQL*Loader detects that only generated specifications are used, it ignores any specified datafile -- no read I/O is performed.
In addition, no memory is required for a bind array. If there are any WHEN clauses in the control file, SQL*Loader assumes that data evaluation is necessary, and input records are read.
This is the simplest form of generated data. It does not vary during the load, and it does not vary between loads.
To set a column to a constant value, use the keyword CONSTANT followed by a value:
CONSTANT value
CONSTANT data is interpreted by SQL*Loader as character input. It is converted, as necessary, to the database column type.
You may enclose the value within quotation marks, and must do so if it contains white space or reserved words. Be sure to specify a legal value for the target column. If the value is bad, every row is rejected.
Numeric values larger than 2**32 - 1 (4,294,967,295) must be enclosed in quotes.
Note: Do not use the CONSTANT keyword to set a column to null. To set a column to null, do not specify that column at all. Oracle automatically sets that column to null when loading the row. The combination of CONSTANT and a value is a complete column specification.
Use the RECNUM keyword after a column name to set that column to the number of the logical record from which that row was loaded. Records are counted sequentially from the beginning of the first datafile, starting with record 1. RECNUM is incremented as each logical record is assembled. Thus it increments for records that are discarded, skipped, rejected, or loaded. If you use the option SKIP=10, the first record loaded has a RECNUM of 11.
The combination of column name and the RECNUM keyword is a complete column specification.
column_name RECNUM
A column specified with SYSDATE gets the current system date, as defined by the SQL language SYSDATE function. See the section "DATE Datatype" in Oracle8i SQL Reference.
The combination of column name and the SYSDATE keyword is a complete column specification.
column_name SYSDATE
The database column must be of type CHAR or DATE. If the column is of type CHAR, then the date is loaded in the form 'dd-mon-yy.' After the load, it can be accessed only in that form. If the system date is loaded into a DATE column, then it can be accessed in a variety of forms that include the time and the date.
A new system date/time is used for each array of records inserted in a conventional path load and for each block of records loaded during a direct path load.
The SEQUENCE keyword ensures a unique value for a particular column. SEQUENCE increments for each record that is loaded or rejected. It does not increment for records that are discarded or skipped.
The combination of column name and the SEQUENCE function is a complete column specification.
where:
If a row is rejected (that is, it has a format error or causes an Oracle error), the generated sequence numbers are not reshuffled to mask this. If four rows are assigned sequence numbers 10, 12, 14, and 16 in a particular column, and the row with 12 is rejected; the three rows inserted are numbered 10, 14, and 16, not 10, 12, 14. This allows the sequence of inserts to be preserved despite data errors. When you correct the rejected data and reinsert it, you can manually set the columns to agree with the sequence.
Case 3: Loading a Delimited, Free-Format File provides an example the SEQUENCE function.
Because a unique sequence number is generated for each logical input record, rather than for each table insert, the same sequence number can be used when inserting data into multiple tables. This is frequently useful behavior.
Sometimes, you might want to generate different sequence numbers for each INTO TABLE clause. For example, your data format might define three logical records in every input record. In that case, you can use three INTO TABLE clauses, each of which inserts a different part of the record into the same table. Note that, when you use SEQUENCE(MAX), SQL*Loader will use the maximum from each table which can lead to inconsistencies in sequence numbers.
To generate sequence numbers for these records, you must generate unique numbers for each of the three inserts. There is a simple technique to do so. Use the number of table-inserts per record as the sequence increment and start the sequence numbers for each insert with successive numbers.
Suppose you want to load the following department names into the DEPT table. Each input record contains three department names, and you want to generate the department numbers automatically.
Accounting Personnel Manufacturing Shipping Purchasing Maintenance ...
You could use the following control file entries to generate unique department numbers:
INTO TABLE dept (deptno sequence(1, 3), dname position(1:14) char) INTO TABLE dept (deptno sequence(2, 3), dname position(16:29) char) INTO TABLE dept (deptno sequence(3, 3), dname position(31:44) char)
The first INTO TABLE clause generates department number 1, the second number 2, and the third number 3. They all use 3 as the sequence increment (the number of department names in each record). This control file loads Accounting as department number 1, Personnel as 2, and Manufacturing as 3.
The sequence numbers are then incremented for the next record, so Shipping loads as 4, Purchasing as 5, and so on.
SQL*Loader has a rich palette of datatypes. These datatypes are grouped into portable and non-portable datatypes. Within each of these two groups, the datatypes are subgrouped into length-value datatypes and value datatypes.
The main grouping, portable vs. non-portable, refers to the platform dependency of the datatype. This issue arises due to a number of platform specificities such as differences in the byte ordering schemes of different platforms (big-endian vs. little-endian), differences in how many bits a particular platform is (16 bit, 32 bit, 64 bit), differences in signed number representation schemes (2's complement vs. 1's complement), etc. Note that not all of these problems apply to all of the non-portable datatypes.
The sub-grouping, value vs. length-value addresses different issues. While value datatypes assume a single part to a datafield, length-value datatypes require that the datafield consist of two sub fields -- the length subfield which specifies how long the second (value) subfield is.
VALUE Datatypes | Length-Value Datatypes |
INTEGER |
|
SMALLINT |
VARCHAR |
FLOAT |
VARGRAPHIC |
DOUBLE |
VARRAW |
BYTEINT |
LONG VARRAW |
ZONED |
|
(packed) DECIMAL |
|
The data is a full-word binary integer (unsigned). If you specify start:end in the POSITION clause, end is ignored. The length of the field is the length of a full-word integer on your system. (Datatype LONG INT in C.) This length cannot be overridden in the control file.
INTEGER
The data is a half-word binary integer (unsigned). If you specify start:end in the POSITION clause, end is ignored. The length of the field is a half-word integer is on your system.
SMALLINT
Additional Information: This is the SHORT INT datatype in the C programming language. One way to determine its length is to make a small control file with no data and look at the resulting log file. This length cannot be overridden in the control file. See your Oracle operating system-specific documentation for details.
The data is a single-precision, floating-point, binary number. If you specify end in the POSITION clause, it is ignored. The length of the field is the length of a single-precision, floating-point binary number on your system. (Datatype FLOAT in C.) This length cannot be overridden in the control file.
The data is a double-precision, floating-point binary number. If you specify end in the POSITION clause, it is ignored. The length of the field is the length of a double-precision, floating-point binary number on your system. (Datatype DOUBLE or LONG FLOAT in C.) This length cannot be overridden in the control file.
DOUBLE
The decimal value of the binary representation of the byte is loaded. For example, the input character x"1C" is loaded as 28. The length of a BYTEINT field is always 1 byte. If POSITION(start:end) is specified, end is ignored. (Datatype UNSIGNED CHAR in C.)
The syntax for this datatype is:
BYTEINT
An example is:
(column1 position(1) BYTEINT, column2 BYTEINT, ... )
ZONED data is in zoned decimal format: a string of decimal digits, one per byte, with the sign included in the last byte. (In COBOL, this is a SIGN TRAILING field.) The length of this field is equal to the precision (number of digits) that you specify.
The syntax for this datatype is:
where precision is the number of digits in the number, and scale (if given) is the number of digits to the right of the (implied) decimal point. For example:
sal POSITION(32) ZONED(8),
specifies an 8-digit integer starting at position 32.
DECIMAL data is in packed decimal format: two digits per byte, except for the last byte which contains a digit and sign. DECIMAL fields allow the specification of an implied decimal point, so fractional values can be represented.
The syntax for the this datatype is:
where:
For example,
sal DECIMAL (7,2)
would load a number equivalent to +12345.67. In the data record, this field would take up 4 bytes. (The byte length of a DECIMAL field is equivalent to (N+1)/2, rounded up, where N is the number of digits in the value, and one is added for the sign.)
The data is a varying-length, double-byte character string. It consists of a length subfield followed by a string of double-byte characters (DBCS).
Additional Information: The size of the length subfield is the size of the SQL*Loader SMALLINT datatype on your system (C type SHORT INT). See SMALLINT for more information.
The length of the current field is given in the first two bytes. This length is a count of graphic (double-byte) characters. So it is multiplied by two to determine the number of bytes to read.
The syntax for this datatype is:
A maximum length specified after the VARGRAPHIC keyword does not include the size of the length subfield. The maximum length specifies the number of graphic (double byte) characters. So it is also multiplied by two to determine the maximum length of the field in bytes.
The default maximum field length is 4Kb graphic characters, or 8 Kb
(2 * 4Kb). It is a good idea to specify a maximum length for such fields whenever possible, to minimize memory requirements. See Determining the Size of the Bind Array for more details.
The POSITION clause, if used, gives the location of the length subfield, not of the first graphic character. If you specify POSITION(start:end), the end location determines a maximum length for the field. Both start and end identify single-character (byte) positions in the file. Start is subtracted from (end + 1) to give the length of the field in bytes. If a maximum length is specified, it overrides any maximum length calculated from POSITION.
If a VARGRAPHIC field is truncated by the end of the logical record before its full length is read, a warning is issued. Because a VARCHAR field's length is embedded in every occurrence of the input data for that field, it is assumed to be accurate.
VARGRAPHIC data cannot be delimited.
A VARCHAR field is a length-value datatype. It consists of a binary length subfield followed by a character string of the specified length.
Additional Information: The size of the length subfield is the size of the SQL*Loader SMALLINT datatype on your system (C type SHORT INT). See SMALLINT for more information.
The syntax for this datatype is:
A maximum length specified in the control file does not include the size of the length subfield. If you specify the optional maximum length after the VARCHAR keyword, then a buffer of that size is allocated for these fields.
The default buffer size is 4 Kb. Specifying the smallest maximum length that is needed to load your data can minimize SQL*Loader's memory requirements, especially if you have many VARCHAR fields. See Determining the Size of the Bind Array for more details.
The POSITION clause, if used, gives the location of the length subfield, not of the first text character. If you specify POSITION(start:end), the end location determines a maximum length for the field. Start is subtracted from (end + 1) to give the length of the field in bytes. If a maximum length is specified, it overrides any length calculated from POSITION.
If a VARCHAR field is truncated by the end of the logical record before its full length is read, a warning is issued. Because a VARCHAR field's length is embedded in every occurrence of the input data for that field, it is assumed to be accurate.
VARCHAR data cannot be delimited.
VARRAW is made up of a two byte binary length-subfield followed by a RAW string value-subfield.
The syntax for this datatype is shown in the diagram for datatype_spec.
VARRAW results in a VARRAW with 2 byte length-subfield and a max size of 4 Kb (i.e. default). VARRAW(65000) results in a VARRAW whose length subfield is 2 bytes and has a max size of 65000 bytes.
LONG VARRAW is a VARRAW with a four byte length-subfield instead of a two byte length-subfield.
The syntax for this datatype is shown in the diagram for datatype_spec.
LONG VARRAW results in a VARRAW with 4 byte length-subfield and a max size of 4 Kb (i.e. default). LONG VARRAW(300000) results in a VARRAW whose length subfield is 4 bytes and has a max size of 300000 bytes.
VALUE Datatypes | Length-Value Datatypes |
|
|
CHAR |
VARCHARC |
DATE |
VARRAWC |
INTEGER EXTERNAL |
|
RAW |
|
GRAPHIC |
|
GRAPHIC EXTERNAL |
|
The character datatypes are CHAR, DATE, and the numeric EXTERNAL datatypes. These fields can be delimited and can have lengths (or maximum lengths) specified in the control file.
The data field contains character data. The length is optional and is taken from the POSITION specification if it is not present here. If present, this length overrides the length in the POSITION specification. If no length is given, CHAR data is assumed to have a length of 1. The syntax is:
A field of datatype CHAR may also be variable-length delimited or enclosed. See Specifying Delimiters.
Attention: If the column in the database table is defined as LONG or a VARCHAR2, you must explicitly specify a maximum length (maximum for a LONG is two gigabytes) either with a length specifier on the CHAR keyword or with the POSITION keyword. This guarantees that a large enough buffer is allocated for the value and is necessary even if the data is delimited or enclosed.
The data field contains character data that should be converted to an Oracle date using the specified date mask. The syntax is:
For example:
LOAD DATA INTO TABLE DATES (COL_A POSITION (1:15) DATE "DD-Mon-YYYY") BEGINDATA 1-Jan-1991 1-Apr-1991 28-Feb-1991
Attention: Whitespace is ignored and dates are parsed from left to right unless delimiters are present.
The length specification is optional, unless a varying-length date mask is specified. In the example above, the date mask specifies a fixed-length date format of 11 characters. SQL*Loader counts 11 characters in the mask, and therefore expects a maximum of 11 characters in the field, so the specification works properly. But, with a specification such as
DATE "Month dd, YYYY"
the date mask is 14 characters, while the maximum length of a field such as
September 30, 1991
is 18 characters. In this case, a length must be specified. Similarly, a length is required for any Julian dates (date mask "J")--a field length is required any time the length of the date string could exceed the length of the mask (that is, the count of characters in the mask).
If an explicit length is not specified, it can be derived from the POSITION clause. It is a good idea to specify the length whenever you use a mask, unless you are absolutely sure that the length of the data is less than, or equal to, the length of the mask.
An explicit length specification, if present, overrides the length in the POSITION clause. Either of these overrides the length derived from the mask. The mask may be any valid Oracle date mask. If you omit the mask, the default Oracle date mask of "dd-mon-yy" is used.
The length must be enclosed in parentheses and the mask in quotation marks. Case 3: Loading a Delimited, Free-Format File provides an example of the DATE datatype.
A field of datatype DATE may also be specified with delimiters. For more information, see Specifying Delimiters.
A date field that consists entirely of whitespace produces an error unless NULLIF BLANKS is specified. For more information, see Loading All-Blank Fields.
The data is a string of double-byte characters (DBCS). Oracle does not support DBCS, however SQL*Loader reads DBCS as single bytes. Like RAW data, GRAPHIC fields are stored without modification in whichever column you specify.
The syntax for this datatype is:
For GRAPHIC and GRAPHIC EXTERNAL, specifying POSITION(start:end) gives the exact location of the field in the logical record.
If you specify the length after the GRAPHIC (EXTERNAL) keyword, however, then you give the number of double-byte graphic characters. That value is multiplied by 2 to find the length of the field in bytes. If the number of graphic characters is specified, then any length derived from POSITION is ignored. No delimited datafield specification is allowed with GRAPHIC datatype specification.
If the DBCS field is surrounded by shift-in and shift-out characters, use GRAPHIC EXTERNAL. This is identical to GRAPHIC, except that the first and last characters (the shift-in and shift-out) are not loaded. The syntax for this datatype is:
where:
GRAPHIC |
Data is double-byte characters. |
EXTERNAL |
First and last characters are ignored. |
graphic_char_length |
Length in DBCS (see GRAPHIC above). |
For example, let [ ] represent shift-in and shift-out characters, and let # represent any double-byte character.
To describe ####, use "POSITION(1:4) GRAPHIC" or "POSITION(1) GRAPHIC(2)".
To describe [####], use "POSITION(1:6) GRAPHIC EXTERNAL" or "POSITION(1) GRAPHIC EXTERNAL(2)".
The numeric external datatypes are the numeric datatypes (INTEGER, FLOAT, DECIMAL, and ZONED) specified with the EXTERNAL keyword with optional length and delimiter specifications.
These datatypes are the human-readable, character form of numeric data. Numeric EXTERNAL may be specified with lengths and delimiters, just like CHAR data. Length is optional, but if specified, overrides POSITION.
The syntax for this datatype is:
Attention: The data is a number in character form, not binary representation. So these datatypes are identical to CHAR and are treated identically, except for the use of DEFAULTIF. If you want the default to be null, use CHAR; if you want it to be zero, use EXTERNAL. See also Setting a Column to Null or Zero and DEFAULTIF Clause.
FLOAT EXTERNAL data can be given in either scientific or regular notation. Both "5.33" and "533E-2" are valid representations of the same value.
The data is raw, binary data loaded "as is". It does not undergo character set conversion. If loaded into a RAW database column, it is not converted by Oracle. If it is loaded into a CHAR column, Oracle converts it to hexadecimal. It cannot be loaded into a DATE or number column.
The syntax for this datatype is
The length of this field is the number of bytes specified in the control file. This length is limited only by the length of the target column in the database and by memory resources. RAW datafields can not be delimited.
The datatype VARCHARC consists of a character length-subfield followed by a character string value-subfield.
The syntax for this datatype is shown in the diagram for datatype_spec.
For example:
The datatype VARRAWC consists of a RAW string value-subfield.
The syntax for this datatype is shown in the diagram for datatype_spec.
For example:
There are several ways to specify a length for a field. If multiple lengths are specified and they conflict, then one of the lengths takes precedence. A warning is issued when a conflict exists. The following rules determine which field length is used:
For example, if the native datatype INTEGER is 4 bytes long and the following field specification is given:
column1 POSITION(1:6) INTEGER
then a warning is issued, and the proper length (4) is used. In this case, the log file shows the actual length used under the heading "Len" in the column table:
Column Name Position Len Term Encl Datatype ----------------------- --------- ----- ---- ---- --------- COLUMN1 1:6 4 INTEGER
The datatype specifications in the control file tell SQL*Loader how to interpret the information in the datafile. The server defines the datatypes for the columns in the database. The link between these two is the column name specified in the control file.
SQL*Loader extracts data from a field in the input file, guided by the datatype specification in the control file. SQL*Loader then sends the field to the server to be stored in the appropriate column (as part of an array of row inserts).
The server does any necessary data conversion to store the data in the proper internal format. Note that the client does datatype conversion for fields in collections columns (VARRAYs and nested tables). It does not do datatype conversion when loading nested tables as a separate table from the parent.
The datatype of the data in the file does not necessarily need to be the same as the datatype of the column in the Oracle table. Oracle automatically performs conversions, but you need to ensure that the conversion makes sense and does not generate errors. For instance, when a datafile field with datatype CHAR is loaded into a database column with datatype NUMBER, you must make sure that the contents of the character field represent a valid number.
Note: SQL*Loader does not contain datatype specifications for Oracle internal datatypes such as NUMBER or VARCHAR2. SQL*Loader's datatypes describe data that can be produced with text editors (character datatypes) and with standard programming languages (native datatypes). However, although SQL*Loader does not recognize datatypes like NUMBER and VARCHAR2, any data that Oracle is capable of converting may be loaded into these or other database columns.
The boundaries of CHAR, DATE, or numeric EXTERNAL fields may also be marked by specific delimiter characters contained in the input data record. You indicate how the field is delimited by using a delimiter specification after specifying the datatype.
Delimited data can be TERMINATED or ENCLOSED.
TERMINATED fields are read from the starting position of the field up to, but not including, the first occurrence of the delimiter character. If the terminator delimiter is found in the first column position, the field is null.
If TERMINATED BY WHITESPACE is specified, data is read until the first occurrence of a whitespace character (space, tab, newline). Then the current position is advanced until no more adjacent whitespace characters are found. This allows field values to be delimited by varying amounts of whitespace.
Enclosed fields are read by skipping whitespace until a non-whitespace character is encountered. If that character is the delimiter, then data is read up to the second delimiter. Any other character causes an error.
If two delimiter characters are encountered next to each other, a single occurrence of the delimiter character is used in the data value. For example, 'DON''T' is stored as DON'T. However, if the field consists of just two delimiter characters, its value is null. You may specify a TERMINATED BY clause, an ENCLOSED BY clause, or both. If both are used, the TERMINATED BY clause must come first. The syntax for delimiter specifications is:
where:
Here are some examples, with samples of the data they describe:
TERMINATED BY ',' a data string, ENCLOSED BY '"' a data string" TERMINATED BY ',' ENCLOSED BY '" a data string", ENCLOSED BY "(" AND ')'(a data string)
Sometimes the same punctuation mark that is a delimiter also needs to be included in the data. To make that possible, two adjacent delimiter characters are interpreted as a single occurrence of the character, and this character is included in the data. For example, this data:
(The delimiters are left parentheses, (, and right parentheses, )).)
with this field specification:
ENCLOSED BY "(" AND ")"
puts the following string into the database:
The delimiters are left paren's, (, and right paren's, ).
For this reason, problems can arise when adjacent fields use the same delimiters. For example, the following specification:
field1 TERMINATED BY "/" field2 ENCLOSED by "/"
the following data will be interpreted properly:
This is the first string/ /This is the second string/
But if field1 and field2 were adjacent, then the results would be incorrect, because
This is the first string//This is the second string/
would be interpreted as a single character string with a "/" in the middle, and that string would belong to field1.
The default maximum length of delimited data is 255 bytes. So delimited fields can require significant amounts of storage for the bind array. A good policy is to specify the smallest possible maximum value; see Determining the Size of the Bind Array.
Trailing blanks can only be loaded with delimited datatypes. If a data field is nine characters long and contains the value DANIELbbb, where bbb is three blanks, it is loaded into Oracle as "DANIEL" if declared as CHAR(9). If you want the trailing blanks, you could declare it as CHAR(9) TERMINATED BY ':', and add a colon to the datafile so that the field is DANIELbbb:. This field is loaded as "DANIEL ", with the trailing blanks. For more discussion on whitespace in fields, see Trimming Blanks and Tabs.
A control file can specify multiple lengths for the character-data fields CHAR, DATE, and numeric EXTERNAL. If conflicting lengths are specified, one of the lengths takes precedence. A warning is also issued when a conflict exists. This section explains which length is used.
If you specify a starting position and ending position for one of these fields, then the length of the field is determined by these specifications. If you specify a length as part of the datatype and do not give an ending position, the field has the given length. If starting position, ending position, and length are all specified, and the lengths differ; then the length given as part of the datatype specification is used for the length of the field. For example, if
position(1:10) char(15)
is specified, then the length of the field is 15.
If a delimited field is specified with a length, or if a length can be calculated from the starting and ending position, then that length is the maximum length of the field. The actual length can vary up to that maximum, based on the presence of the delimiter. If a starting and ending position are both specified for the field and if a field length is specified in addition, then the specified length value overrides the length calculated from the starting and ending position.
If the expected delimiter is absent and no maximum length has been specified, then the end of record terminates the field. If TRAILING NULLCOLS is specified, remaining fields are null. If either the delimiter or the end of record produce a field that is longer than the specified maximum, SQL*Loader generates an error.
The length of a date field depends on the mask, if a mask is specified. The mask provides a format pattern, telling SQL*Loader how to interpret the data in the record. For example, if the mask is specified as:
"Month dd, yyyy"
then "May 3, 1991" would occupy 11 character positions in the record, while "January 31, 1992" would occupy 16.
If starting and ending positions are specified, however, then the length calculated from the position specification overrides a length derived from the mask. A specified length such as "DATE (12)" overrides either of those. If the date field is also specified with terminating or enclosing delimiters, then the length specified in the control file is interpreted as a maximum length for the field.
When a datafile created on one platform is to be loaded on a different platform, the data must be written in a form that the target system can read. For example, if the source system has a native, floating-point representation that uses 16 bytes, and the target system's floating-point numbers are 12 bytes, there is no way for the target system to directly read data generated on the source system.
The best solution is to load data across a Net8 database link, taking advantage of the automatic conversion of datatypes. This is the recommended approach, whenever feasible.
Problems with inter-platform loads typically occur with native datatypes. In some situations, it is possible to avoid problems by lengthening a field by padding it with zeros, or to read only part of the field to shorten it. (For example, when an 8-byte integer is to be read on a system that uses 4-byte integers, or vice versa.) Note, however, that incompatible byte-ordering or incompatible datatype implementation, may prevent this.
If you cannot use a Net8 database link, it is advisable to use only the CHAR, DATE, VARCHARC, and NUMERIC EXTERNAL datatypes. Datafiles written using these datatypes are longer than those written with native datatypes. They may take more time to load, but they transport more readily across platforms. However, where incompatible byte-ordering is an issue, special filters may still be required to reorder the data.
The determination of bind array size pertains to SQL*Loader's conventional path option. It does not apply to the direct path load method. Because a direct path load formats database blocks directly, rather than using Oracle's SQL interface, it does not use a bind array.
SQL*Loader uses the SQL array-interface option to transfer data to the database. Multiple rows are read at one time and stored in the bind array. When SQL*Loader sends Oracle an INSERT command, the entire array is inserted at one time. After the rows in the bind array are inserted, a COMMIT is issued.
The bind array has to be large enough to contain a single row. If the maximum row length exceeds the size of the bind array, as specified by the BINDSIZE parameter, SQL*Loader generates an error. Otherwise, the bind array contains as many rows as can fit within it, up to the limit set by the value of the ROWS parameter.
The BINDSIZE and ROWS parameters are described in Command-Line Keywords.
Although the entire bind array need not be in contiguous memory, the buffer for each field in the bind array must occupy contiguous memory. If the operating system cannot supply enough contiguous memory to store a field, SQL*Loader generates an error.
To minimize the number of calls to Oracle and maximize performance, large bind arrays are preferable. In general, you gain large improvements in performance with each increase in the bind array size up to 100 rows. Increasing the bind array size above 100 rows generally delivers more modest improvements in performance. So the size (in bytes) of 100 rows is typically a good value to use. The remainder of this section details the method for determining that size.
In general, any reasonably large size will permit SQL*Loader to operate effectively. It is not usually necessary to perform the detailed calculations described in this section. This section should be read when maximum performance is desired, or when an explanation of memory usage is needed.
When you specify a bind array size using the command-line parameter BINDSIZE (see BINDSIZE (maximum size)) or the OPTIONS clause in the control file (see OPTIONS), you impose an upper limit on the bind array. The bind array never exceeds that maximum.
As part of its initialization, SQL*Loader determines the space required to load a single row. If that size is too large to fit within the specified maximum, the load terminates with an error.
SQL*Loader then multiplies that size by the number of rows for the load, whether that value was specified with the command-line parameter ROWS (see ROWS (rows per commit)) or the OPTIONS clause in the control file (see OPTIONS).
If that size fits within the bind array maximum, the load continues--SQL*Loader does not try to expand the number of rows to reach the maximum bind array size. If the number of rows and the maximum bind array size are both specified, SQL*Loader always uses the smaller value for the bind array.
If the maximum bind array size is too small to accommodate the initial number of rows, SQL*Loader uses a smaller number of rows that fits within the maximum.
The bind array's size is equivalent to the number of rows it contains times the maximum length of each row. The maximum length of a row is equal to the sum of the maximum field lengths, plus overhead.
bind array size = (number of rows) * (maximum row length)
where:
(maximum row length) = SUM(fixed field lengths) + SUM(maximum varying field lengths) + SUM(overhead for varying length fields)
Many fields do not vary in size. These fixed-length fields are the same for each loaded row. For those fields, the maximum length of the field is the field size, in bytes, as described in SQL*Loader Datatypes. There is no overhead for these fields.
The fields that can vary in size from row to row are
VARCHAR VARGRAPHIC CHAR DATE numeric EXTERNAL
The maximum length of these datatypes is described in SQL*Loader Datatypes. The maximum lengths describe the number of bytes, or character positions, that the fields can occupy in the input data record. That length also describes the amount of storage that each field occupies in the bind array, but the bind array includes additional overhead for fields that can vary in size.
When the character datatypes (CHAR, DATE, and numeric EXTERNAL) are specified with delimiters, any lengths specified for these fields are maximum lengths. When specified without delimiters, the size in the record is fixed, but the size of the inserted field may still vary, due to whitespace trimming. So internally, these datatypes are always treated as varying-length fields--even when they are fixed-length fields.
A length indicator is included for each of these fields in the bind array. The space reserved for the field in the bind array is large enough to hold the longest possible value of the field. The length indicator gives the actual length of the field for each row.
In summary:
bind array size = (number of rows) * ( SUM(fixed field lengths) + SUM(maximum varying field lengths) + ( (number of varying length fields) * (size of length-indicator) ) )
On most systems, the size of the length indicator is two bytes. On a few systems, it is three bytes. To determine its size, use the following control file:
OPTIONS (ROWS=1) LOAD DATA INFILE * APPEND INTO TABLE DEPT (deptno POSITION(1:1) CHAR) BEGINDATA a
This control file "loads" a one-character field using a one-row bind array. No data is actually loaded, due to the numeric conversion error that occurs when "a" is loaded as a number. The bind array size shown in the log file, minus one (the length of the character field) is the value of the length indicator.
Note: A similar technique can determine bind array size without doing any calculations. Run your control file without any data and with ROWS=1 to determine the memory requirements for a single row of data. Multiply by the number of rows you want in the bind array to get the bind array size.
The following tables summarize the memory requirements for each datatype. "L" is the length specified in the control file. "P" is precision. "S" is the size of the length indicator. For more information on these values, see SQL*Loader Datatypes.
Datatype | Size |
INTEGER |
|
SMALLINT |
|
FLOAT |
|
DOUBLE |
|
Datatype | Default Size | Specified Size |
---|---|---|
(packed) DECIMAL |
None |
(P+1)/2, rounded up |
ZONED |
None |
P |
RAW |
None |
L |
CHAR (no delimiters) |
1 |
|
DATE (no delimiters) |
None |
|
numeric EXTERNAL (no delimiters) |
None |
|
Datatype | Default Size |
Length Specified with POSITION |
Length Specified with DATATYPE |
GRAPHIC |
None |
L |
2*L |
GRAPHIC |
None |
L - 2 |
2*(L-2) |
VARGRAPHIC |
4Kb*2 |
L+S |
(2*L)+S |
Datatype | Default Size |
Maximum Length Specified (L) |
VARCHAR |
4Kb |
L+S |
CHAR (delimited) |
|
|
Pay particular attention to the default sizes allocated for VARCHAR, VARGRAPHIC, and the delimited forms of CHAR, DATE, and numeric EXTERNAL fields. They can consume enormous amounts of memory--especially when multiplied by the number of rows in the bind array. It is best to specify the smallest possible maximum length for these fields. For example:
CHAR(10) TERMINATED BY ","
uses (10 + 2) * 64 = 768 bytes in the bind array, assuming that the length indicator is two bytes long. However:
CHAR TERMINATED BY ","
uses (255 + 2) * 64 = 16,448 bytes, because the default maximum size for a delimited field is 255. This can make a considerable difference in the number of rows that fit into the bind array.
When calculating a bind array size for a control file that has multiple INTO TABLE statements, calculate as if the INTO TABLE statements were not present. Imagine all of the fields listed in the control file as one, long data structure -- that is, the format of a single row in the bind array.
If the same field in the data record is mentioned in multiple INTO TABLE clauses, additional space in the bind array is required each time it is mentioned. So, it is especially important to minimize the buffer allocations for fields like these.
Generated data is produced by the SQL*Loader functions CONSTANT, RECNUM, SYSDATE, and SEQUENCE. Such generated data does not require any space in the bind array.
If you want all inserted values for a given column to be null, omit the column's specifications entirely. To set a column's values conditionally to null based on a test of some condition in the logical record, use the NULLIF clause; see NULLIF Keyword. To set a numeric column to zero instead of NULL, use the DEFAULTIF clause, described next.
Using DEFAULTIF on numeric data sets the column to zero when the specified field condition is true. Using DEFAULTIF on character (CHAR or DATE) data sets the column to null (compare with Numeric External Datatypes). See also Specifying Field Conditions for details on the conditional tests.
DEFAULTIF field_condition
A column may have both a NULLIF clause and a DEFAULTIF clause, although this often would be redundant.
Note: The same effects can be achieved with the SQL string and the DECODE function. See Applying SQL Operators to Fields
Use the NULLIF keyword after the datatype and optional delimiter specification, followed by a condition. The condition has the same format as that specified for a WHEN clause. The column's value is set to null if the condition is true. Otherwise, the value remains unchanged.
NULLIF field_condition
The NULLIF clause may refer to the column that contains it, as in the following example:
COLUMN1 POSITION(11:17) CHAR NULLIF (COLUMN1 = "unknown")
This specification may be useful if you want certain data values to be replaced by nulls. The value for a column is first determined from the datafile. It is then set to null just before the insert takes place. Case 6: Loading Using the Direct Path Load Method provides examples of the NULLIF clause.
Note: The same effect can be achieved with the SQL string and the NVL function. See Applying SQL Operators to Fields.
When the control file specifies more fields for a record than are present in the record, SQL*Loader must determine whether the remaining (specified) columns should be considered null or whether an error should be generated. The TRAILING NULLCOLS clause, described in TRAILING NULLCOLS, explains how SQL*Loader proceeds in this case.
Totally blank fields for numeric or DATE fields cause the record to be rejected. To load one of these fields as null, use the NULLIF clause with the BLANKS keyword, as described in the section Comparing Fields to BLANKS. Case 6: Loading Using the Direct Path Load Method provides examples of how to load all-blank fields as null with the NULLIF clause.
If an all-blank CHAR field is surrounded by enclosure delimiters, then the blanks within the enclosures are loaded. Otherwise, the field is loaded as null. More details on whitespace trimming in character fields are presented in the following section.
Blanks and tabs constitute whitespace. Depending on how the field is specified, whitespace at the start of a field (leading whitespace) and at the end of a field (trailing whitespace) may, or may not be, included when the field is inserted into the database. This section describes the way character data fields are recognized, and how they are loaded. In particular, it describes the conditions under which whitespace is trimmed from fields.
Note: Specifying PRESERVE BLANKS changes this behavior. See Preserving Whitespace for more information.
The information in this section applies only to fields specified with one of the character-data datatypes:
Although VARCHAR fields also contain character data, these fields are never trimmed. A VARCHAR field includes all whitespace that is part of the field in the datafile.
There are two ways to specify field length. If a field has a constant length that is defined in the control file, then it has a predetermined size. If a field's length is not known in advance, but depends on indicators in the record, then the field is delimited.
Fields that have a predetermined size are specified with a starting position and ending position, or with a length, as in the following examples:
loc POSITION(19:31) loc CHAR(14)
In the second case, even though the field's exact position is not specified, the field's length is predetermined.
Delimiters are characters that demarcate field boundaries. Enclosure delimiters surround a field, like the quotes in:
"__aa__"
where "__" represents blanks or tabs. Termination delimiters signal the end of a field, like the comma in:
__aa__,
Delimiters are specified with the control clauses TERMINATED BY and ENCLOSED BY, as shown in the following examples:
loc POSITION(19) TERMINATED BY "," loc POSITION(19) ENCLOSED BY '"' loc TERMINATED BY "." OPTIONALLY ENCLOSED BY '|'
If predetermined size is specified for a delimited field, and the delimiter is not found within the boundaries indicated by the size specification; then an error is generated. For example, if you specify:
loc POSITION(19:31) CHAR TERMINATED BY ","
and no comma is found between positions 19 and 31 of the input record, then the record is rejected. If a comma is found, then it delimits the field.
When a starting position is not specified for a field, it begins immediately after the end of the previous field. Figure 5-1 illustrates this situation when the previous field has a predetermined size.
If the previous field is terminated by a delimiter, then the next field begins immediately after the delimiter, as shown in Figure 5-2.
When a field is specified both with enclosure delimiters and a termination delimiter, then the next field starts after the termination delimiter, as shown in Figure 5-3. If a non-whitespace character is found after the enclosure delimiter, but before the terminator, then SQL*Loader generates an error.
In Figure 5-3, both fields are stored with leading whitespace. Fields do not include leading whitespace in the following cases:
These cases are illustrated in the following sections.
If the previous field is TERMINATED BY WHITESPACE, then all the whitespace after the field acts as the delimiter. The next field starts at the next non-whitespace character. Figure 5-4 illustrates this case.
This situation occurs when the previous field is explicitly specified with the TERMINATED BY WHITESPACE clause, as shown in the example. It also occurs when you use the global FIELDS TERMINATED BY WHITESPACE clause.
Leading whitespace is also removed from a field when optional enclosure delimiters are specified but not present.
Whenever optional enclosure delimiters are specified, SQL*Loader scans forward, looking for the first delimiter. If none is found, then the first non-whitespace character signals the start of the field. SQL*Loader skips over whitespace, eliminating it from the field. This situation is shown in Figure 5-5.
Unlike the case when the previous field is TERMINATED BY WHITESPACE, this specification removes leading whitespace even when a starting position is specified for the current field.
Note: If enclosure delimiters are present, leading whitespace after the initial enclosure delimiter is kept, but whitespace before this delimiter is discarded. See the first quote in FIELD1, Figure 5-5.
Trailing whitespace is only trimmed from character-data fields that have a predetermined size. It is always trimmed from those fields.
If a field is enclosed, or terminated and enclosed, like the first field shown in Figure 5-5, then any whitespace outside the enclosure delimiters is not part of the field. Any whitespace between the enclosure delimiters belongs to the field, whether it is leading or trailing whitespace.
Table 5-5 summarizes when and how whitespace is removed from input data fields when PRESERVE BLANKS is not specified. See the following section, Preserving Whitespace, for details on how to prevent trimming.
To prevent whitespace trimming in all CHAR, DATE, and NUMERIC EXTERNAL fields, you specify PRESERVE BLANKS in the control file. Whitespace trimming is described in the previous section, Trimming Blanks and Tabs.
PRESERVE BLANKS retains leading whitespace when optional enclosure delimiters are not present. It also leaves trailing whitespace intact when fields are specified with a predetermined size. This keyword preserves tabs and blanks; for example, if the field
__aa__,
(where underscores represent blanks) is loaded with the following control clause:
TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
then both the leading whitespace and the trailing whitespace are retained if PRESERVE BLANKS is specified. Otherwise, the leading whitespace is trimmed.
Note: The word BLANKS is not optional. Both words must be specified.
When the previous field is terminated by whitespace, then PRESERVE BLANKS does not preserve the space at the beginning of the next field, unless that field is specified with a POSITION clause that includes some of the whitespace. Otherwise, SQL*Loader scans past all whitespace at the end of the previous field until it finds a non-blank, non-tab character.
A wide variety of SQL operators may be applied to field data with the SQL string. This string may contain any combination of SQL expressions that are recognized by Oracle as valid for the VALUES clause of an INSERT statement. In general, any SQL function that returns a single value may be used. See the section "Expressions" in the "Operators, Functions, Expressions, Conditions chapter in the Oracle8i SQL Reference.
The column name and the name of the column in the SQL string must match exactly, including the quotation marks, as in this example of specifying the control file:
LOAD DATA INFILE * APPEND INTO TABLE XXX ( "LAST" position(1:7) char "UPPER(:\"LAST\)", FIRST position(8:15) char "UPPER(:FIRST)" ) BEGINDATA Phil Locke Jason Durbin
The SQL string must be enclosed in double quotation marks. In the example above, LAST must be in quotation marks because it is a SQL*Loader keyword. FIRST is not a SQL*Loader keyword and therefore does not require quotation marks. To quote the column name in the SQL string, you must escape it.
The SQL string appears after any other specifications for a given column. It is evaluated after any NULLIF or DEFAULTIF clauses, but before a DATE mask. It may not be used on RECNUM, SEQUENCE, CONSTANT, or SYSDATE fields. If the RDBMS does not recognize the string, the load terminates in error. If the string is recognized, but causes a database error, the row that caused the error is rejected.
To refer to fields in the record, precede the field name with a colon (:). Field values from the current record are substituted. The following examples illustrate references to the current field:
field1 POSITION(1:6) CHAR "LOWER(:field1)" field1 CHAR TERMINATED BY ',' NULLIF ((1) = 'a') DEFAULTIF ((1)= 'b') "RTRIM(:field1)" field1 CHAR(7) "TRANSLATE(:field1, ':field1', ':1')"
In the last example, only the :field1 that is not in single quotes is interpreted as a column name. For more information on the use of quotes inside quoted strings, see Specifying Filenames and Objects Names.
field1 POSITION(1:4) INTEGER EXTERNAL "decode(:field2, '22', '34', :field1)"
Note: SQL strings cannot reference fields in column objects or fields that are loaded using OID, SID, REF, or BFILE. Also, they cannot reference filler fields.
Other fields in the same record can also be referenced, as in this example:
field1 POSITION(1:4) INTEGER EXTERNAL "decode(:field2, '22', '34', :field1)"
Loading external data with an implied decimal point:
field1 POSITION(1:9) DECIMAL EXTERNAL(8) ":field1/1000"
Truncating fields that could be too long:
field1 CHAR TERMINATED BY "," "SUBSTR(:field1, 1, 10)"
Multiple operators can also be combined, as in the following examples:
field1 POSITION(*+3) INTEGER EXTERNAL "TRUNC(RPAD(:field1,6,'0'), -2)" field1 POSITION(1:8) INTEGER EXTERNAL "TRANSLATE(RTRIM(:field1),'N/A', '0')" field1 CHARACTER(10) "NVL( LTRIM(RTRIM(:field1)), 'unknown' )"
When used with a date mask, the date mask is evaluated after the SQL string. A field specified as:
field1 DATE 'dd-mon-yy' "RTRIM(:field1)"
would be inserted as:
TO_DATE(RTRIM(<field1_value>), 'dd-mon-yyyy')
It is possible to use the TO_CHAR operator to store formatted dates and numbers. For example:
field1 ... "TO_CHAR(:field1, '$09999.99')"
could store numeric input data in formatted form, where field1 is a character column in the database. This field would be stored with the formatting characters (dollar sign, period, and so on) already in place.
You have even more flexibility, however, if you store such values as numeric quantities or dates. You can then apply arithmetic functions to the values in the database, and still select formatted values for your reports.
The SQL string is used in Case 7: Extracting Data from a Formatted Report to load data from a formatted report.
Column object in the control file are described in terms of their attributes. In the datafile, the data corresponding to each of the attributes of a column-object is in a datafield similar to that corresponding to a simple relational column.
Following are some examples of loading column objects. First, where the data is in predetermined size fields and second, where the data is in delimited fields.
LOAD DATA INFILE 'sample.dat' INTO TABLE departments (dept_no POSITION(01:03) CHAR, dept_name POSITION(05:15) CHAR, 1 dept_mgr COLUMN OBJECT (name POSITION(17:33) CHAR, age POSITION(35:37) INTEGER EXTERNAL, emp_id POSITION(40:46) INTEGER EXTERNAL) )
101 Mathematics Johny Quest 30 1024 237 Physics Albert Einstein 65 0000
Note:
LOAD DATA INFILE 'sample.dat' "var 6" INTO TABLE departments FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' 1 (dept_no dept_name, dept_mgr COLUMN OBJECT (name CHAR(30), age INTEGER EXTERNAL(5), emp_id INTEGER EXTERNAL(5)) )
2 000034101,Mathematics,Johny Q.,30,1024, 000039237,Physics,"Albert Einstein",65,0000,
Notes:
Example 5-3 shows a control file describing nested column-objects (one column-object nested in another column-object).
LOAD DATA INFILE `sample.dat' INTO TABLE departments_v2 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' (dept_no CHAR(5), dept_name CHAR(30), dept_mgr COLUMN OBJECT (name CHAR(30), age INTEGER EXTERNAL(3), emp_id INTEGER EXTERNAL(7), 1 em_contact COLUMN OBJECT (name CHAR(30), phone_num CHAR(20))))
101,Mathematics,Johny Q.,30,1024,"Barbie",650-251-0010, 237,Physics,"Albert Einstein",65,0000,Wife Einstein,654-3210,
Note:
Specifying null values for non-scalar datatypes is somewhat more complex than for scalar datatypes. An object can have a subset of its attributes be null, it can have all of its attributes be null (an attributively null object), or it can be null itself (an atomically null object).
In fields corresponding to object columns, you can use the NULLIF clause to specify the field conditions under which a particular attribute should be initialized to null. Example 5-4 demonstrates this.
LOAD DATA INFILE 'sample.dat' INTO TABLE departments (dept_no POSITION(01:03) CHAR, dept_name POSITION(05:15) CHAR NULLIF dept_name=BLANKS, dept_mgr COLUMN OBJECT 1 ( name POSITION(17:33) CHAR NULLIF dept_mgr.name=BLANKS, 1 age POSITION(35:37) INTEGER EXTERNAL NULLIF dept_mgr.age=BLANKS, 1 emp_id POSITION(40:46) INTEGER EXTERNAL NULLIF dept_mgr.emp_id=BLANKS))
2 101 Johny Quest 1024 237 Physics Albert Einstein 65 0000
Notes:
dept_mgr
value is null. The dept_name
value is also null.
To specify in the control file the condition under which a particular object should take null value (atomic null), you must follow that object's name with a NULLIF clause based on a logical combination of any of the mapped fields (for example, in Specifying NULL Values for Objects , the named mapped fields would be dept_no
, dept_name
, name
, age
, emp_id
, but dept_mgr
would not be a named mapped field because it does not correspond (is not mapped to) any field in the datafile).
Although the above is workable, it is not ideal when the condition under which an object should take the value of null is independent of any of the mapped fields. In such situations, you can use filler fields (see Secondary Data Files (SDFs) and LOBFILES).
You can map a filler field to the field in the datafile (indicating if a particular object is atomically null or not) and use the filler filed in the field condition of the NULLIF clause of the particular object.
For example:
LOAD DATA INFILE 'sample.dat' INTO TABLE departments_v2 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' (dept_no CHAR(5), dept_name CHAR(30), 1 is_null FILLER CHAR, 2 dept_mgr COLUMN OBJECT NULLIF is_null=BLANKS (name CHAR(30) NULLIF dept_mgr.name=BLANKS, age INTEGER EXTERNAL(3) NULLIF dept_mgr.age=BLANKS, emp_id INTEGER EXTERNAL(7) NULLIF dept_mgr.emp_id=BLANKS, em_contact COLUMN OBJECT NULLIF is_null2=BLANKS (name CHAR(30) NULLIF dept_mgr.em_contact.name=BLANKS, phone_num CHAR(20) NULLIF dept_mgr.em_contact.phone_num=BLANKS)), 1) is_null2 FILLER CHAR)
101,Mathematics,n,Johny Q.,,1024,"Barbie",608-251-0010,, 237,Physics,,"Albert Einstein",65,0000,,650-654-3210,n,
Notes:
The control file syntax required to load an object table is nearly identical to that used to load a typical relational table. Example 5-6 demonstrates loading an object table with primary key OIDs.
LOAD DATA INFILE 'sample.dat' DISCARDFILE 'sample.dsc' BADFILE 'sample.bad' REPLACE INTO TABLE employees FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' (name CHAR(30) NULLIF name=BLANKS, age INTEGER EXTERNAL(3) NULLIF age=BLANKS, emp_id INTEGER EXTERNAL(5))
Johny Quest, 18, 007, Speed Racer, 16, 000,
Note that by looking only at the above control file you might not be able to determine if the table being loaded was an object table with system generated OIDs (real OIDs), an object table with primary key OIDs, or a relational table.
Note also that you may want to load data which already contains real OIDs and may want to specify that, instead of generating new OIDs, the existing OIDs in the datafile should be used. To do this, you would follow the INTO TABLE clause with the OID clause:
:= OID (<fieldname>)
where <fieldname> is the name of one of the fields (typically a filler field) from the field specification list which is mapped to a datafield that contains the real OIDs. SQL*Loader assumes that the OIDs provided are in the correct format and that they preserve OID global uniqueness. Therefore, you should use the oracle OID generator to generate the OIDs to be loaded to insure uniqueness. Note also that the OID clause can only be used for system-generated OIDs, not primary key OIDs.
Example 5-7 demonstrates loading real OIDs with the row-objects.
LOAD DATA INFILE 'sample.dat' INTO TABLE employees_v2 1 OID (s_oid) FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' (name CHAR(30 NULLIF name=BLANKS, age INTEGER EXTERNAL(3) NULLIF age=BLANKS, emp_id INTEGER EXTERNAL(5) 2 s_oid FILLER CHAR(32)
3 Johny Quest, 18, 007, 21E978406D3E41FCE03400400B403BC3, Speed Racer, 16, 000, 21E978406D4441FCE03400400B403BC3,
Notes:
SQL Loader can load real REF columns (REFs containing real OIDs of the referenced objects) as well as primary key REF columns:
SQL*Loader assumes, when loading real REF columns, that the actual OIDs from which the REF columns are to be constructed are in the datafile with the rest of the data. The description of the field corresponding to a REF column consists of the column name followed by the REF directive.
The REF directive takes as arguments the table name and an OID. Note that the arguments can be specified either as constants or dynamically (using filler fields). See REF_spec for the appropriate syntax. Example 5-8 demonstrates real REF loading:
LOAD DATA INFILE `sample.dat' INTO TABLE departments_alt_v2 FIELDS TERMINATED BY `,' OPTIONALLY ENCLOSED BY `"' (dept_no CHAR(5), dept_name CHAR(30), 1) dept_mgr REF(t_name, s_oid), s_oid FILLER CHAR(32), t_name FILLER CHAR(30))
22345, QuestWorld, 21E978406D3E41FCE03400400B403BC3, EMPLOYEES_V2, 23423, Geography, 21E978406D4441FCE03400400B403BC3, EMPLOYEES_V2,
Note
dept_mgr
field itself does not map to any field in the datafile.
To load a primary key REF column, the SQL*Loader control-file field description must provide the column name followed by a REF directive. The REF directive takes for arguments a comma separated list of field names/constant values. The first argument is the table name followed by arguments that specify the primary key OID on which the REF column to be loaded is based. See REF_spec for the appropriate syntax.
Note that SQL*Loader assumes the ordering of the arguments matches the relative ordering of the columns making up the primary key OID in the referenced table. Example 5-9 demonstrates loading primary key REFs:
LOAD DATA INFILE `sample.dat' INTO TABLE departments_alt FIELDS TERMINATED BY `,' OPTIONALLY ENCLOSED BY `"' (dept_no CHAR(5), dept_name CHAR(30), dept_mgr REF(CONSTANT `EMPLOYEES', emp_id), emp_id FILLER CHAR(32))
22345, QuestWorld, 007, 23423, Geography, 000,
The following sections discuss using SQL*Loader to load internal LOBs: BLOBs, CLOBs, NCLOBs, external LOBs and BFILEs.
Because LOBs can be quite large, SQL*Loader is able to load LOB data from either the main datafile (inline with the rest of the data) or from LOBFILEs. See Loading LOB Data Using LOBFILEs.
To load LOB data from the main datafile, you can use the standard SQL*Loader formats. The LOB data instances can be in predetermined size fields, delimited fields, or length-value pair fields. The following examples illustrate these situations.
This is a very fast and conceptually simple format in which to load LOBs.
Note: Because the LOBs you are loading may not be of equal size, you can use whitespace to pad the LOB data to make the LOBs all of equal length within a particular datafield. For more information on trimming trailing whitespaces see Trimming Whitespace: Summary.
To load LOBs using this format, you should use either CHAR or RAW as the loading datatype.
LOAD DATA INFILE 'sample.dat' "fix 501" INTO TABLE person_table (name POSITION(01:21) CHAR, 1 "RESUME" POSITION(23:500) CHAR DEFAULTIF "RESUME"=BLANKS)
Johny Quest Johny Quest 500 Oracle Parkway jquest@us.oracle.com ...
Note:
This format handles LOBs of different sizes within the same column (datafile field) without problem. Note, however, that this added flexibility can impact performance because SQL*Loader must scan through the data, looking for the delimiter string. See Secondary Data Files (SDFs) and LOBFILES.
LOAD DATA INFILE 'sample.dat' "str '|'" INTO TABLE person_table FIELDS TERMINATED BY ',' (name CHAR(25), 1 "RESUME" CHAR(507) ENCLOSED BY '<startlob>' AND '<endlob>')
Johny Quest,<startlob> Johny Quest 500 Oracle Parkway jquest@us.oracle.com ... <endlob> 2 |Speed Racer, .......
Notes:
<startlob>
and <endlob>
are the enclosure strings. Note that the maximum length for a LOB that can be read using the CHAR(507) is 507 bytes.
'|'
had been placed right after <endlob>
and followed with the newline character, the newline would have been interpreted as part of the next record. An alternative would be to make the newline part of the record separator (for example, '|\n'
or, in hex, X'7C0A'
).
You can use VARCHAR (see VARCHAR), VARCHARC or VARRAW datatypes (see Discarded and Rejected Records) to load LOB data organized in length-value pair fields. Note that this method of loading provides better performance than using delimited fields, but can reduce flexibility (for example, you must know the LOB length for each LOB before loading). Example 5-12 demonstrates loading LOB data in length-value pair fields.
LOAD DATA INFILE 'sample.dat' "str '<endrec>\n'" INTO TABLE person_table FIELDS TERMINATED BY ',' (name CHAR(25), 1 "RESUME" VARCHARC(3,500))
Johny Quest,479 Johny Quest 500 Oracle Parkway jquest@us.oracle.com ... <endrec> 2 3 Speed Racer,000<endrec>
Notes:
LOB data can be lengthy enough that it makes sense to load it from a LOBFILE. In LOBFILEs, LOB data instances are still considered to be in fields (predetermined size, delimited, length-value), but these fields are not organized into records (the concept of a record does not exist within LOBFILEs). Therefore, the processing overhead of dealing with records is avoided. This type of organization of data is ideal for LOB loading.
In Example 5-13, each LOBFILE is the source of a single LOB. To load LOB data that is organized in this way, you would follow the column/field name with the LOBFILE datatype specifications. For example:
LOAD DATA INFILE 'sample.dat' INTO TABLE person_table FIELDS TERMINATED BY ',' (name CHAR(20), 1 ext_fname FILLER CHAR(40), 2 "RESUME" LOBFILE(ext_fname) TERMINATED BY EOF)
Johny Quest,jqresume.txt, Speed Racer,'/private/sracer/srresume.txt',
Johny Quest 500 Oracle Parkway ...
Speed Racer 400 Oracle Parkway ...
Notes:
ext_fname
filler field. It then loads the data from the LOBFILE (using the CHAR datatype) from the first byte to the EOF character, whichever is reached first. Note that if no existing LOBFILE is specified, the "RESUME" field is initialized to empty. See also Dynamic Versus Static LOBFILE and SDF Specifications.
In Example 5-14, you specify the size of the LOBs to be loaded into a particular column in the control file. During the load, SQL*Loader assumes that any LOB data loaded into that particular column is of the specified size. The predetermined size of the fields allows the data-parser to perform optimally. One difficulty is that it is often hard to guarantee that all the LOBs are of the same size.
LOAD DATA INFILE 'sample.dat' INTO TABLE person_table FIELDS TERMINATED BY ',' (name CHAR(20), ext_fname FILLER CHAR(40), 1 "RESUME" LOBFILE(CONSTANT '/usr/private/jquest/jqresume') CHAR(2000))
Johny Quest, Speed Racer,
Johny Quest 500 Oracle Parkway ... Speed Racer 400 Oracle Parkway ...
Note:
'jqresume.txt'
LOBFILE, using the CHAR datatype, starting with the byte following the byte loaded last during the current loading session.
In Example 5-15, the LOB data instances in the LOBFILE are delimited. In this format, loading different size LOBs into the same column is not a problem. Keep in mind that this added flexibility can impact performance because SQL*Loader must scan through the data, looking for the delimiter string.
LOAD DATA INFILE 'sample.dat' INTO TABLE person_table FIELDS TERMINATED BY ',' (name CHAR(20), 1 "RESUME" LOBFILE( CONSTANT 'jqresume') CHAR(2000) TERMINATED BY "<endlob>\n")
Johny Quest, Speed Racer,
Johny Quest 500 Oracle Parkway ... <endlob> Speed Racer 400 Oracle Parkway ... <endlob>
Note:
max-length
(2000) tells SQL*Loader what to expect as the maximum length of the field which can result in memory usage optimization. If you choose to specify max-length, you should be sure not to underestimate its value. The TERMINATED BY clause specifies the string that terminates the LOBs. Alternatively, you could use the ENCLOSED BY clause. The ENCLOSED BY clause allows a bit more flexibility as to the relative positioning of the LOBs in the LOBFILE (the LOBs in the LOBFILE need not be sequential).
In this example, each LOB in the LOBFILE is preceded by its length. One could use VARCHAR (see VARCHAR), VARCHARC or VARRAW datatypes (see Discarded and Rejected Records) to load LOB data organized in this way.
Note that this method of loading can provide better performance over delimited LOBs, but at the expense of some flexibility (for example, you must know the LOB length for each LOB before loading).
LOAD DATA INFILE 'sample.dat' INTO TABLE person_table FIELDS TERMINATED BY ',' (name CHAR(20), 1 "RESUME" LOBFILE(CONSTANT 'jqresume') VARCHARC(4,2000))
Johny Quest, Speed Racer,
2 0501Johny Quest 500 Oracle Parkway ... 3 0000
Notes:
VARCHARC(4, 2000)
tells SQL*Loader that the LOBs in the LOBFILE are in length-value pair format and that first 4 bytes should be interpreted as the length. max_length
tells SQL*Loader that the maximum size of the field is 2000.
0501
preceding Johnny Quest
tells SQL*Loader that the LOB consists of the next 501 characters.
One should keep in mind the following when loading LOBs from LOBFILES:
max_length
is specified, SQL*Loader uses it as a hint to optimize memory usage. Keep in mind that it is very important that the max_length
specification does not understate the true maximum length.
The BFILE datatype stores unstructured binary data in operating-system files outside the database. A BFILE column or attribute stores a file locator that points to the external file containing the data. Note that the file which is to be loaded as a BFILE does not have to exist at the time of loading, it can be created later. SQL*Loader assumes that the necessary directory objects have already been created (a logical alias name for a physical directory on the server's filesystem). For more information, see the Oracle8i Application Developer's Guide - Large Objects (LOBs).
A control file field corresponding to a BFILE column consists of column name followed by the BFILE clause. The BFILE clause takes as arguments a DIRECTORY OBJECT name followed by a BFILE name, both of which can be provided as string constants, or they can be dynamically loaded through some other field. See the Oracle8i SQL Reference for more information.
In the next two examples of loading BFILES, Example 5-17 has only the filename specified dynamically. Example 5-18 demonstrates specifying both the BFILE and the DIRECTORY OBJECT dynamically.
LOAD DATA INFILE sample.dat INTO TABLE planets FIELDS TERMINATED BY ',' (pl_id CHAR(3), pl_name CHAR(20), fname FILLER CHAR(30), 1) pl_pict BFILE(CONSTANT "scoTT_dir1", fname))
1,Mercury,mercury.jpeg, 2,Venus,venus.jpeg, 3,Earth,earth.jpeg,
Note
LOAD DATA INFILE sample.dat INTO TABLE planets FIELDS TERMINATED BY ',' (pl_id NUMBER(4), pl_name CHAR(20), fname FILLER CHAR(30), 1) dname FILLER CHAR(20)); pl_pict BFILE(dname, fname),
1, Mercury, mercury.jpeg, scott_dir1, 2, Venus, venus.jpeg, scott_dir1, 3, Earth, earth.jpeg, scott_dir2,
Note
Like LOBs, collections can also be loaded either from the main datafile (data inline) or from secondary datafile(s) (data outofline). See Secondary Data Files (SDFs).
When loading collection data, a mechanism must exist by which SQL*Loader can tell when the data belonging to a particular collection instance has ended. You can achieve this in two ways:
In the control file, collections are described similarly to column objects (see Loading Column Objects). There are some differences:
name
, age
, and empid
, could not be used in a field condition specification of a NULLIF or a DEFAULTIF clause for dept_no
, dname
, emp_cnt
, emps
or projects
.
See SQL*Loader's Data Definition Language (DDL) Syntax Diagrams for syntax diagrams of both nested tables and VARRAYs.
Example 5-19 demonstrates loading a varray and a nested table.
LOAD DATA INFILE `sample.dat' "str `|\n' " INTO TABLE dept FIELDS TERMINATED BY `,' OPTIONALLY ENCLOSED BY `"' TRAILING NULLCOLS ( dept_no CHAR(3), dname CHAR(20) NULLIF dname=BLANKS , 1) emp_cnt FILLER INTEGER EXTERNAL(5), 2) emps VARRAY COUNT(emp_cnt) 3) (name) FILLER CHAR(10), emps COLUMN OBJECT NULLIF emps.emps.name=BLANKS (name) CHAR(30), age INTEGER EXTERNAL(3), emp_id CHAR(7) NULLIF emps.emps.emp_id=BLANKS)), mysid FILLER CHAR(32), 4) projects NESTED TABLE SDF(CONSTANT `pr.txt' "fix 71") SID(mysid) TERMINATED BY ";" (project_id POSITION(1:5) INTEGER EXTERNAL(5), project_name POSITION(7:30) CHAR NULLIF projects.project_name=BLANKS, p_desc_src FILLER POSITION(35:70) CHAR, 5) proj_desc LOBFILE( projects.p_desc_src) CHAR(2000) TERMINATED BY "<>\n"))
101,Math,2, ,"J. H.",28,2828, ,"Cy",123,9999,21E978407D4441FCE03400400B403BC3| 6) 210,"Topologic Transforms", ,21E978408D4441FCE03400400B403BC3|
21034 Topological Transforms '/mydir/projdesc.txt'; 7) 77777 Impossible Proof;
8) Topological Transforms equate ...........<> If there is more then one LOB in the file, it starts here .....<>
Notes:
emp_cnt
is a filler field used as an argument to the COUNT clause.
p_desc_src
is null, the DEFAULTIF clause will initialize the proj_desc
LOB to empty. See DEFAULTIF Clause.
emp_cnt
field is null which, under the DEFAULTIF clause, translates to 0 (an empty LOB). Therefore, the set-id is loaded. If "mysid"
does not contain a valid hexadecimal number, the record is rejected. Keep in mind that SQL*Loader performs no other set-id validation.
p_desc_src
is missing but, because the TRAILING NULLCOLS clause is present, the p_desc_src
is initialized to null.
<>
followed by a newline character.
When loading a table which contain a nested table column, it may be possible to load the parent table separately from the child table. You can do independent loading of the parent and child tables if the SIDs (system-generated or user-defined) are already known at the time of the load (i.e. the SIDs are in the datafile with the data).
LOAD DATA INFILE `sample.dat' "str `|\n' " INTO TABLE dept FIELDS TERMINATED BY `,' OPTIONALLY ENCLOSED BY `"' TRAILING NULLCOLS ( dept_no CHAR(3), dname CHAR(20) NULLIF dname=BLANKS , mysid FILLER CHAR(32), 1) projects SID(mysid))
101,Math,21E978407D4441FCE03400400B403BC3,| 210,"Topology",21E978408D4441FCE03400400B403BC3,|
Note:
mysid
is a filler field which is mapped to a datafile field containing the actual set-id's and is supplied as an argument to the SID clause.
LOAD DATA INFILE `sample.dat' INTO TABLE dept FIELDS TERMINATED BY `,' OPTIONALLY ENCLOSED BY `"' TRAILING NULLCOLS 1) SID(sidsrc) project_id INTEGER EXTERNAL(5), project_name CHAR(20) NULLIF project_name=BLANKS, sidsrc FILLER CHAR(32))
21034, "Topological Transforms",21E978407D4441FCE03400400B403BC3, 77777, "Impossible Proof",21E978408D4441FCE03400400B403BC3,
Note
"sidsrc"
is the filler field name which is the source of the real set-id's.