EDB*Loader v15
EDB*Loader is a high-performance bulk data loader that provides an interface compatible with Oracle databases for EDB Postgres Advanced Server. The EDB*Loader command line utility loads data from an input source, typically a file, into one or more tables using a subset of the parameters offered by Oracle SQL*Loader.
EDB*Loader features include:
- Support for the Oracle SQL*Loader data loading methods (conventional path load, direct path load, and parallel direct path load)
- Syntax for control file directives compatible with Oracle SQL*Loader
- Input data with delimiter-separated or fixed-width fields
- Bad file for collecting rejected records
- Loading of multiple target tables
- Discard file for collecting records that don't meet the selection criteria of any target table
- Log file for recording the EDB*Loader session and any error messages
- Data loading from standard input and remote loading, particularly useful for large data sources on remote hosts
Note
The following are important version compatibility restrictions between the EDB*Loader client and the database server.
When you invoke the EDB*Loader program (called
edbldr
), you pass in parameters and directive information to the database server. We strongly recommend that you use the version 14 EDB*Loader client (the edbldr program supplied with EDB Postgres Advanced Server 14) to load data only into version 14 of the database server. In general, use the same version for the EDB*Loader client and database server.Using EDB*Loader with connection poolers such as PgPool-II and PgBouncer isn't supported. EDB*Loader must connect directly to EDB Postgres Advanced Server version 14. Alternatively, the following are some of the commands you can use for loading data through connection poolers:
Data loading methods
As with Oracle SQL*Loader, EDB*Loader supports three data loading methods:
- Conventional path load
- Direct path load
- Parallel direct path load
Conventional path load is the default method used by EDB*Loader. Use basic insert processing to add rows to the table.
The advantage of a conventional path load is that table constraints and database objects defined on the table are enforced during a conventional path load. Table constraints and database objects include primary keys, not null constraints, check constraints, unique indexes, foreign key constraints, triggers, and so on.
One exception is that the EDB Postgres Advanced Server rules defined on the table aren't enforced. EDB*Loader can load tables on which rules are defined. However, the rules aren't executed. As a consequence, you can't load partitioned tables implemented using rules with EDB*Loader.
Note
Create EDB Postgres Advanced Server rules using the CREATE RULE
command. EDB Postgres Advanced Server rules aren't the same database objects as rules and rule sets used in Oracle.
EDB*Loader also supports direct path loads. A direct path load is faster than a conventional path load but requires removing most types of constraints and triggers from the table. For more information, see Direct path load.
EDB*Loader supports parallel direct path loads. A parallel direct path load provides even greater performance improvement by permitting multiple EDB*Loader sessions to run simultaneously to load a single table. For more information, see Parallel direct path load.
General usage
EDB*Loader can load data files with either delimiter-separated or fixed-width fields in single-byte or multibyte character sets. The delimiter can be a string consisting of one or more single-byte or multibyte characters. Data file encoding and the database encoding can differ. Character set conversion of the data file to the database encoding is supported.
Each EDB*Loader session runs as a single, independent transaction. If an error occurs during the EDB*Loader session that aborts the transaction, all changes made during the session are rolled back.
Generally, formatting errors in the data file don't result in an aborted transaction. Instead, the badly formatted records are written to a text file called the bad file. The reason for the error is recorded in the log file.
Records causing database integrity errors result in an aborted transaction and rollback. As with formatting errors, the record causing the error is written to the bad file and the reason is recorded in the log file.
Note
EDB*Loader differs from Oracle SQL*Loader in that a database integrity error results in a rollback in EDB*Loader. In Oracle SQL*Loader, only the record causing the error is rejected. Records that were previously inserted into the table are retained, and loading continues after the rejected record.
The following are examples of types of formatting errors that don't abort the transaction:
- Attempt to load non-numeric value into a numeric column
- Numeric value too large for a numeric column
- Character value too long for the maximum length of a character column
- Attempt to load improperly formatted date value into a date column
The following are examples of types of database errors that abort the transaction and result in the rollback of all changes made in the EDB*Loader session:
- Violating a unique constraint such as a primary key or unique index
- Violating a referential integrity constraint
- Violating a check constraint
- Error thrown by a trigger fired as a result of inserting rows
Building the EDB*Loader control file
When you invoke EDB*Loader, the list of arguments provided must include the name of a control file. The control file includes the instructions that EDB*Loader uses to load the tables from the input data file. The control file includes information such as:
- The name of the input data file containing the data to load
- The name of the tables to load from the data file
- Names of the columns in the tables and their corresponding field placement in the data file
- Specification of whether the data file uses a delimiter string to separate the fields or if the fields occupy fixed column positions
- Optional selection criteria to choose the records from the data file to load into a given table
- The name of the file that collects illegally formatted records
- The name of the discard file that collects records that don't meet the selection criteria of any table
The syntax for the EDB*Loader control file is:
Where field_def
defines a field in the specified data_file
. The field describes the location, data format, or value of the data to insert into column_name
of target_table
. The syntax of field_def
is:
Where fieldtype
is one of:
Description
The specification of data_file
, bad_file
, and discard_file
can include the full directory path or a relative directory path to the file name. If the file name is specified alone or with a relative directory path, the file is then assumed to exist, in the case of data_file
, relative to the current working directory from which you invoke edbldr. In the case of
bad_fileor
discard_file`, it's created.
You can include references to environment variables in the EDB*Loader control file when referring to a directory path or file name. Environment variable references are formatted differently on Windows systems than on Linux systems:
- On Linux, the format is
$ENV_VARIABLE
or${ENV_VARIABLE}
. - On Windows, the format is
%ENV_VARIABLE%
.
Where ENV_VARIABLE
is the environment variable that's set to the directory path or file name.
Set the EDBLDR_ENV_STYLE
environment variable to interpret environment variable references as Windows-styled references or Linux-styled references regardless of the operating system on which EDB*Loader resides. You can use this environment variable to create portable control files for EDB*Loader.
- On a Windows system, set
EDBLDR_ENV_STYLE
tolinux
orunix
to recognize Linux-style references in the control file. - On a Linux system, set
EDBLDR_ENV_STYLE
towindows
to recognize Windows-style references in the control file.
The operating system account enterprisedb
must have read permission on the directory and file specified by data_file
.
The operating system account enterprisedb must have write permission on the directories where bad_file
and discard_file
are written.
Note
The file names for data_file
, bad_file
, and discard_file
must have extensions .dat
, .bad
, and .dsc
, respectively. If the provided file name doesn't have an extension, EDB*Loader assumes the actual file name includes the appropriate extension.
Suppose an EDB*Loader session results in data format errors, the BADFILE
clause isn't specified, and the BAD parameter isn't given on the command line when edbldr
is invoked. In this case, a bad file is created with the name control_file_base.bad
in the directory from which edbldr
is invoked. control_file_base
is the base name of the control file used in the edbldr
session.
If all of the following conditions are true, the discard file isn't created even if the EDB*Loader session results in discarded records:
- The
DISCARDFILE
clause for specifying the discard file isn't included in the control file. - The
DISCARD
parameter for specifying the discard file isn't included on the command line. - The
DISCARDMAX
clause for specifying the maximum number of discarded records isn't included in the control file. - The
DISCARDS
clause for specifying the maximum number of discarded records isn't included in the control file. - The
DISCARDMAX
parameter for specifying the maximum number of discarded records isn't included on the command line.
Suppose you don't specify the DISCARDFILE
clause and the DISCARD
parameter for explicitly specifying the discard file name, but you do specify DISCARDMAX
or DISCARDS
. In this case, the EDB*Loader session creates a discard file using the data file name with an extension of .dsc
.
Note
The keywords DISCARD
and DISCARDS
differ. DISCARD
is an EDB*Loader command line parameter used to specify the discard file name (see General usage). DISCARDS
is a clause of the LOAD DATA
directive that can appear only in the control file. Keywords DISCARDS
and DISCARDMAX
provide the same functionality of specifying the maximum number of discarded records allowed before terminating the EDB*Loader session. Records loaded into the database before terminating the EDB*Loader session due to exceeding the DISCARDS
or DISCARDMAX
settings are kept in the database. They aren't rolled back.
Specifying one of INSERT
, APPEND
, REPLACE
, or TRUNCATE
establishes the default action for adding rows to target tables. The default action is INSERT
.
If you specify the FIELDS TERMINATED BY
clause, then you can't specify the POSITION (start:end)
clause for any field_def
. Alternatively, if you don't specify the FIELDS TERMINATED BY
clause, then every field_def
must contain the POSITION (start:end)
clause, the fieldtype(length)
clause, or the CONSTANT
clause.
Parameters
OPTIONS param=value
Use the OPTIONS
clause to specify param=value
pairs that represent an EDB*Loader directive. If you specify a parameter in the OPTIONS
clause and on the command line when edbldr
is invoked, the command line setting takes effect.
Specify one or more of the following parameter/value pairs:
DIRECT= { FALSE | TRUE }
With
DIRECT
set toTRUE
, EDB*Loader performs a direct path load instead of a conventional path load. The default value ofDIRECT
isFALSE
.Don't set
DIRECT=true
when loading the data into a replicated table. If you're using EDB*Loader to load data into a replicated table and setDIRECT=true
, indexes might omit rows that are in a table or can potentially contain references to rows that were deleted. Direct inserts to load data into replicated tables aren't supported.For information on direct path loads see, Direct path load.
ERRORS=error_count
error_count
specifies the number of errors permitted before aborting the EDB*Loader session. The default is50
.FREEZE= { FALSE | TRUE }
Set
FREEZE
toTRUE
to copy the data with the rowsfrozen
. A tuple guaranteed to be visible to all current and future transactions is marked as frozen to prevent transaction ID wraparound. For more information about frozen tuples, see the [PostgreSQL core documentation](<https://www.postgresql.org/docs/current/static/routine-vacuuming.html).You must specify a data-loading type of
TRUNCATE
in the control file when using theFREEZE
option.FREEZE
isn't supported for direct loading.By default,
FREEZE
isFALSE
.PARALLEL= { FALSE | TRUE }
Set
PARALLEL
toTRUE
to indicate that this EDB*Loader session is one of a number of concurrent EDB*Loader sessions participating in a parallel direct path load. The default value ofPARALLEL
isFALSE
.When
PARALLEL
isTRUE
, you must also set theDIRECT
parameter toTRUE
. For more information about parallel direct path loads, see Parallel direct path load.ROWS=n
n
specifies the number of rows that EDB*Loader commits before loading the next set ofn
rows.If EDB*Loader encounters an invalid row during a load in which the
ROWS
parameter is specified, those rows committed prior to encountering the error remain in the destination table.SKIP=skip_count
skip_count
specifies the number of records at the beginning of the input data file to skip before loading begins. The default is0
.SKIP_INDEX_MAINTENANCE={ FALSE | TRUE }
If
SKIP_INDEX_MAINTENANCE
isTRUE
, index maintenance isn't performed as part of a direct path load, and indexes on the loaded table are marked as invalid. The default value ofSKIP_INDEX_MAINTENANCE
isFALSE
.Note
During a parallel direct path load, target table indexes aren't updated. They are marked as invalid after the load is complete.
You can use the
REINDEX
command to rebuild an index. For more information about theREINDEX
command, see the PostgreSQL core documentation.
charset
Use the CHARACTERSET
clause to identify the character set encoding of data_file
, where charset
is the character set name. This clause is required if the data file encoding differs from the control file encoding. The control file encoding must always be in the encoding of the client where edbldr
is invoked.
Examples of charset
settings are UTF8
, SQL_ASCII
, and SJIS
.
For more information about client-to-database character-set conversion, see the PostgreSQL core documentation.
data_file
File containing the data to load into target_table
. Each record in the data file corresponds to a row to insert into target_table
.
If you don't include an extension in the file name, EDB*Loader assumes the file has an extension of .dat
, for example, mydatafile.dat
.
!!! Note
If you specify the DATA
parameter on the command line when invoking edbldr
, the file given by the command line DATA
parameter is used instead.
If you omit the INFILE
clause and the command line DATA
parameter, then the data file name is assumed to be the same as the control file name but with the extension .dat
.
stdin
Specify stdin
(all lowercase letters) if you want to use standard input to pipe the data to load directly to EDB*Loader. This technique is useful for data sources generating a large number of records to load.
bad_file
A file that receives data_file
records that can't load due to errors. The bad file is generated for collecting rejected or bad records.
For EDB Postgres Advanced Server version 12 and later, a bad file is generated only if there are any bad or rejected records. However, if an existing bad file has the same name and location, and no bad records are generated after invoking a new version of edbldr
, the existing bad file remains intact.
If you don't include an extension in the file name, EDB*Loader assumes the file has an extension of .bad
, for example, mybadfile.bad
.
!!! Note
If you specify the BAD
parameter on the command line when invoking edbldr
, the file given with the command line BAD
parameter is used instead.
discard_file
File that receives input data records that aren't loaded into any table. This input data records are discarded because none of the selection criteria are met for tables with the WHEN
clause and there are no tables without a WHEN
clause. All records meet the selection criteria of a table without a WHEN
clause.
If you don't include an extension with the file name, EDB*Loader assumes the file has an extension of .dsc
, for example, mydiscardfile.dsc
.
!!! Note
If you specify the DISCARD
parameter on the command line when invoking edbldr
, the file given with the command line DISCARD
parameter is used instead.
{ DISCARDMAX | DISCARDS } max_discard_recs
Maximum number of discarded records that the input data records can encounter before terminating the EDB*Loader session. You can use either keyword DISCARDMAX
or DISCARDS
preceding the integer value specified by max_discard_recs
.
For example, if max_discard_recs
is 0
, then the EDB*Loader session is terminated when a first discarded record is encountered. If max_discard_recs
is 1
, then the EDB*Loader session is terminated when a second discarded record is encountered.
When the EDB*Loader session is terminated due to exceeding max_discard_recs
, prior input data records that were loaded into the database are retained. They aren't rolled back.
INSERT | APPEND | REPLACE | TRUNCATE
Specifies how to load data into the target tables. Specifying one of INSERT
, APPEND
, REPLACE
, or TRUNCATE
establishes the default action for all tables, overriding the default of INSERT
.
INSERT
Data is loaded into an empty table. EDB*Loader throws an exception and doesn't load any data if the table isn't initially empty.
Note
If the table contains rows, you must use the
TRUNCATE
command to empty the table before invoking EDB*Loader. EDB*Loader throws an exception if you use theDELETE
command to empty the table instead of theTRUNCATE
command. Oracle SQL*Loader allows you to empty the table by using either theDELETE
orTRUNCATE
command.APPEND
Data is added to any existing rows in the table. The table also can be empty initially.
REPLACE
The
REPLACE
keyword andTRUNCATE
keywords are functionally identical. The table is truncated by EDB*Loader before loading the new data.Note
Delete triggers on the table aren't fired as a result of the
REPLACE
operation.TRUNCATE
The table is truncated by EDB*Loader before loading the new data. Delete triggers on the table aren't fired as a result of the
TRUNCATE
operation.
PRESERVE BLANKS
The PRESERVE BLANKS
option works only with the OPTIONALLY ENCLOSED BY
clause. It retains leading and trailing whitespaces for both delimited and predetermined size fields.
In case of NO PRESERVE BLANKS
, if the fields are delimited, then only leading whitespaces are omitted. If any trailing whitespaces are present, they are left intact. In the case of predetermined-sized fields with NO PRESERVE BLANKS
, the trailing whitespaces are omitted. Any leading whitespaces are left intact.
!!! Note
If you don't explicitly provide PRESERVE BLANKS
or NO PRESERVE BLANKS
, then the behavior defaults to NO PRESERVE BLANKS
. This option doesn't work for ideographic whitespaces.
target_table
Name of the table into which to load data. The table name can be schema-qualified (for example, enterprisedb.emp
). The specified target can't be a view.
field_condition
Conditional clause taking the following form:
This conditional clause is used for the WHEN
clause, which is part of the INTO TABLE target_table
clause. It's also used for the NULLIF
clause, which is part of the field definition denoted as field_def
in the syntax diagram.
start
and end
are positive integers specifying the column positions in data_file
that mark the beginning and end of a field to compare with the constant val
. The first character in each record begins with a start
value of 1
.
column_name
specifies the name assigned to a field definition of the data file as defined by field_def
in the syntax diagram.
Using (start
:end
) or column_name
defines the portion of the record in data_file
to compare with the value specified by val
to evaluate as either true or false.
All characters used in the field_condition
text (particularly in the val
string) must be valid in the database encoding. For performing data conversion, EDB*Loader first converts the characters in val
string to the database encoding and then to the data file encoding.
In the WHEN field_condition [ AND field_condition ]
clause, if all such conditions evaluate to TRUE
for a given record, then EDB*Loader attempts to insert that record into target_table
. If the insert operation fails, the record is written to bad_file
.
Suppose, for a given record, that none of the WHEN
clauses evaluate to TRUE
for all INTO TABLE
clauses. The record is written to discard_file
if a discard file was specified for the EDB*Loader session.
See the description of the NULLIF
clause in this parameters list for the effect of field_condition
on this clause.
termstring
String of one or more characters that separates each field in data_file
. The characters can be single byte or multibyte. However, they must be valid in the database encoding. Two consecutive appearances of termstring
with no intervening character results in the corresponding column being set to null.
enclstring
String of one or more characters used to enclose a field value in data_file
. The characters can be single byte or multibyte, However, they must be valid in the database encoding. Use enclstring
on fields where termstring
appears as part of the data.
delimstring
String of one or more characters that separates each record in data_file
. The characters can be single byte or multibyte. However, they must be valid in the database encoding. Two consecutive appearances of delimstring
with no intervening character results in no corresponding row being loaded into the table. You must also terminate the last record (that is, the end of the data file) with the delimstring
characters. Otherwise, the final record isn't loaded into the table.
!!! Note
The RECORDS DELIMITED BY 'delimstring'
clause isn't compatible with Oracle databases.
TRAILING NULLCOLS
If you specify TRAILING NULLCOLS
, then the columns in the column list for which there's no data in data_file
for a given record are set to null when the row is inserted. This option applies only to one or more consecutive columns at the end of the column list.
If fields are omitted at the end of a record and you don't specify TRAILING NULLCOLS
, EDB*Loader assumes the record contains formatting errors and writes it to the bad file.
column_name
Name of a column in target_table
into which to insert a field value defined by field_def
. If the field definition includes the FILLER
or BOUNDFILLER
clause, then column_name
isn't required as the name of a column in the table. It can be any identifier name since the FILLER
and BOUNDFILLER
clauses prevent loading the field data into a table column.
CONSTANT val
Specifies a constant that's type-compatible with the column data type to which it's assigned in a field definition. You can use single or double quotes around val
. If val
contains white space, then you must use quotation marks around it.
The use of the CONSTANT
clause determines the value to assign to a column in each inserted row. No other clause can appear in the same field definition.
If you use the TERMINATED BY
clause to delimit the fields in data_file
, there must be no delimited field in data_file
corresponding to any field definition with a CONSTANT
clause. In other words, EDB*Loader assumes there's no field in data_file
for any field definition with a CONSTANT
clause.
FILLER
Specifies not to load the data in the field defined by the field definition into the associated column if the identifier of the field definition is an actual column name in the table. In this case, the column is set to null. Use of the FILLER
or BOUNDFILLER
clause is the only circumstance in which you don't have to identify the field definition with an actual column name.
Unlike the BOUNDFILLER
clause, you can't reference an identifier defined with the FILLER
clause in a SQL expression. See the discussion of the expr
parameter.
BOUNDFILLER
Specifies not to load the data in the field defined by the field definition into the associated column if the identifier of the field definition is an actual column name in the table. In this case, the column is set to null. Use of the FILLER
or BOUNDFILLER
clause is the only circumstance in which you don't have to identify the field definition with an actual column name.
Unlike the FILLER
clause, a SQL expression can reference an identifier defined with the BOUNDFILLER
clause. See the discussion of the expr
parameter.
POSITION (start:end)
Defines the location of the field in a record in a fixed-width field data file. start
and end
are positive integers. The first character in the record has a start value of 1
.
Field type that describes the format of the data field in data_file
.
!!! Note
Specifying a field type is optional for descriptive purposes and has no effect on whether EDB*Loader successfully inserts the data in the field into the table column. Successful loading depends on the compatibility of the column data type and the field value. For example, a column with data type NUMBER(7,2)
successfully accepts a field containing 2600
. However, if the field contains a value such as 26XX
, the insertion fails, and the record is written to bad_file
.
ZONED
data is not human readable. ZONED
data is stored in an internal format where each digit is encoded in a separate nibble/nybble/4-bit field. In each ZONED
value, the last byte contains a single digit in the high-order 4 bits and the sign in the low-order 4 bits.
length
Specifies the length of the value to load into the associated column.
If you specify the POSITION
(start
:end
) clause with a fieldtype(length)
clause, then the ending position of the field is overridden by the specified length
value. That is, the length of the value to load into the column is determined by the length
value beginning at the start
position and not by the end
position of the POSITION
(start
:end
) clause. Thus, the value to load into the column might be shorter than the field defined by POSITION
(start
:end
). Or, it might go beyond the end position, depending on the specified length
size.
If you specify the FIELDS TERMINATED BY 'termstring'
clause as part of the INTO TABLE
clause, and a field definition contains the fieldtype(length)
clause, then a record is accepted. However, the specified length
values must be greater than or equal to the field lengths as determined by the termstring
characters enclosing all such fields of the record. If the specified length
value is less than a field length as determined by the enclosing termstring
characters for any such field, then the record is rejected.
If you don't specify the FIELDS TERMINATED BY 'termstring'
clause, and you don't include the POSITION
(start
:end
) clause with a field containing the fieldtype(length)
clause, then the starting position of this field begins with the next character following the ending position of the preceding field. The ending position of the preceding field is either:
- The end of its
length
value if the preceding field contains thefieldtype(length)
clause - Its
end
parameter if the field contains thePOSITION
(start
:end
) clause without thefieldtype(length)
clause
precision
Use precision
to specify the length of the ZONED
value.
If the precision
value specified for ZONED
conflicts with the length calculated by the server based on information provided with the POSITION
clause, EDB*Loader uses the value specified for precision
.
scale
Specifies the number of digits to the right of the decimal point in a ZONED
value.
datemask
Specifies the ordering and abbreviation of the day, month, and year components of a date field.
!!! Note
If you specify the DATE
or TIMESTAMP
field type with a SQL expression for the column, then you must specify datemask
after DATE
or TIMESTAMP
and before the SQL expression. See the discussion of the expr
parameter.
When using the TIMESTAMP
field datatype, if you specify time_stamp timestamp "yyyymmddhh24miss"
, the datemask
is converted to the SQL expression. However, in the case of time_stamp timestamp "select to_timestamp(:time_stamp, 'yyyymmddhh24miss')"
, EDB*Loader can't differentiate between datemask and the SQL expression. It treats the third field (SQL expression in the example) as datemask and prepares the SQL expression, which isn't valid. Where:
first field
specifies the column name.second field
specifies the datatype.third field
specifies the datemask.
If you want to provide an SQL expression, then a workaround is to specify the datemask and SQL expression using the TO_CHAR
function as:
NULLIF field_condition [ AND field_condition ] ...
See the description of field_condition
in this parameter list for the syntax of field_condition
.
If all field conditions evaluate to TRUE
, then the column identified by column_name
in the field definition is set to null. If any field condition evaluates to FALSE
, then the column is set to the appropriate value as normally occurs according to the field definition.
PRESERVE BLANKS
The PRESERVE BLANKS
option works only with the OPTIONALLY ENCLOSED BY
clause and retains leading and trailing whitespaces for both delimited and predetermined size fields.
In case of NO PRESERVE BLANKS
, if the fields are delimited, then only leading whitespaces are omitted. If any trailing whitespaces are present, they are left intact. In the case of predetermined-sized fields with NO PRESERVE BLANKS
, the trailing whitespaces are omitted, and any leading whitespaces are left intact.
!!! Note
If you don't provide PRESERVE BLANKS
or NO PRESERVE BLANKS
, then the behavior defaults to NO PRESERVE BLANKS
. This option doesn't work for ideographic whitespaces.
expr
A SQL expression returning a scalar value that's type-compatible with the column data type to which it's assigned in a field definition. Use double quotes around expr
. expr
can contain a reference to any column in the field list except for fields with the FILLER
clause. Prefix the column name using a colon (:
).
expr
can also consist of a SQL SELECT
statement. If you use a SELECT
statement:
Enclose the
SELECT
statement in parentheses, that is,(SELECT ...)
.The select list must consist of one expression following the
SELECT
keyword.The result set must not return more than one row. If no rows are returned, then the returned value of the resulting expression is null.
The following is the syntax for a
SELECT
statement:Note
Omitting the
FROM table_list
clause isn't compatible with Oracle databases. If you don't need to specify any tables, using theFROM DUAL
clause is compatible with Oracle databases.
EDB* Loader control file examples
The following are some examples of control files and their corresponding data files.
Delimiter-separated field data file
This control file uses a delimiter-separated data file that appends rows to the emp
table. The APPEND
clause is used to allow inserting additional rows into the emp
table.
The following is the corresponding delimiter-separated data file:
The use of the TRAILING NULLCOLS
clause allows you to omit the last field supplying the comm
column from the first and last records. The comm
column is set to null for the rows inserted from these records.
Double quotation marks surround the value JONES, JR.
in the last record since the comma delimiter character is part of the field value.
This query displays the rows added to the table after the EDB*Loader session:
Fixed-width field data file
This control file loads the same rows into the emp
table. It uses a data file containing fixed-width fields. The FIELDS TERMINATED BY
and OPTIONALLY ENCLOSED BY
clauses are absent. Instead, each field includes the POSITION
clause.
The following is the corresponding data file containing fixed-width fields:
Single physical record data file – RECORDS DELIMITED BY clause
This control file loads the same rows into the emp
table but uses a data file with one physical record. Terminate each record loaded as a row in the table using a semicolon (;
). The RECORDS DELIMITED BY
clause specifies this value.
The following is the corresponding data file. The content is a single physical record in the data file. The record delimiter character is included following the last record, that is, at the end of the file.
FILLER clause
This control file uses the FILLER
clause in the data fields for the sal
and comm
columns. EDB*Loader ignores the values in these fields and sets the corresponding columns to null.
Using the same fixed-width data file as in the prior fixed-width field example, the resulting rows in the table appear as follows:
BOUNDFILLER clause
This control file uses the BOUNDFILLER
clause in the data fields for the job
and mgr
columns. EDB*Loader ignores the values in these fields and sets the corresponding columns to null in the same manner as the FILLER
clause. However, unlike columns with the FILLER
clause, you can use columns with the BOUNDFILLER
clause in an expression, as shown for column jobdesc
.
The following is the delimiter-separated data file used in this example:
The following table is loaded using the preceding control file and data file:
The resulting rows in the table appear as follows:
Field types with length specification
This control file contains the field type clauses with the length specification:
Note
You can use the POSITION
clause and the fieldtype(length)
clause individually or in combination as long as each field definition contains at least one of the two clauses.
The following is the corresponding data file containing fixed-width fields:
The resulting rows in the table appear as follows:
NULLIF clause
This example uses the NULLIF
clause on the sal
column to set it to null for employees of job MANAGER
. It also uses the clause on the comm
column to set it to null if the employee isn't a SALESMAN
and isn't in department 30
. In other words, a comm
value is accepted if the employee is a SALESMAN
or is a member of department 30
.
The following is the control file:
The following is the corresponding data file:
The resulting rows in the table appear as follows:
Note
The sal
column for employee JONES, JR.
is null since the job is MANAGER
.
The comm
values from the data file for employees PETERSON
, WARREN
, ARNOLDS
, and MAXWELL
are all loaded into the comm
column of the emp
table since these employees are either SALESMAN
or members of department 30
.
The comm
value of 2000.00
in the data file for employee JACKSON
is ignored, and the comm
column of the emp
table is set to null. This employee isn't a SALESMAN
or a member of department 30
.
SELECT statement in a field expression
This example uses a SELECT
statement in the expression of the field definition to return the value to load into the column:
The following is the content of the dept
table used in the SELECT
statement:
The following is the corresponding data file:
The resulting rows in the table appear as follows:
Note
The job
column contains the value from the dname
column of the dept
table returned by the SELECT
statement instead of the job name from the data file.
Multiple INTO TABLE clauses
This example uses multiple INTO TABLE
clauses. For this example, two empty tables are created with the same data definition as the emp
table. The following CREATE TABLE
commands create these two empty tables, without inserting rows from the original emp
table:
This control file contains two INTO TABLE
clauses. Without an APPEND
clause, it uses the default operation of INSERT
. For this operation, the tables emp_research
and emp_sales
must be empty.
The WHEN
clauses specify that when the field designated by columns 47 through 48 contains 20
, the record is inserted into the emp_research
table. When that same field contains 30
, the record is inserted into the emp_sales
table. If neither condition is true, the record is written to the discard file emp_multitbl.dsc
.
The CONSTANT
clause is given for column deptno
, so the specified constant value is inserted into deptno
for each record. When the CONSTANT
clause is used, it must be the only clause in the field definition other than the column name to which the constant value is assigned.
Column comm
of the emp_sales
table is assigned a SQL expression. Expressions can reference column names by prefixing the column name with a colon character (:
).
The following is the corresponding data file:
The records for employees ARNOLDS
and JACKSON
contain 10
and 40
in columns 47 through 48, which don't satisfy any of the WHEN
clauses. EDB*Loader writes these two records to the discard file, emp_multitbl.dsc
, with the following content:
The following are the rows loaded into the emp_research
and emp_sales
tables:
Invoking EDB*Loader
You can run EDB*Loader as superuser or as a normal user. Use the following command to invoke EDB*Loader from the command line:
Description
If you omit the -d
option, the -p
option, or the -h
option, the defaults for the database, port, and host are determined by the same rules as other EDB Postgres Advanced Server utility programs, such as edb-psql
.
You can specify parameters listed in the syntax diagram in a parameter file. Exeptions include the -d
option, -p
option, -h
option, and the PARFILE
parameter. Specify the parameter file on the command line when you invoke edbldr
using PARFILE=param_file
. You can specify some parameters in the OPTIONS
clause in the control file. For more information on the control file, see Building the EDB*Loader control file.
You can include the full directory path or a relative directory path to the file name when specifying control_file
, data_file
, bad_file
, discard_file
, log_file
, and param_file
. If you specify the file name alone or with a relative directory path, the file is assumed to exist in the case of control_file
, data_file
, or param_file
relative to the current working directory from which edbldr
is invoked. In the case of bad_file
, discard_file
, or log_file
, the file is created.
Note
The control file must exist in the character set encoding of the client where edbldr
is invoked. If the client is in an encoding different from the database encoding, then you must set the PGCLIENTENCODING
environment variable on the client to the client’s encoding prior to invoking edbldr
. This technique ensures character set conversion between the client and the database server is done correctly.
The operating system account used to invoke edbldr
must have read permission on the directories and files specified by control_file
, data_file
, and param_file
.
The operating system account enterprisedb
must have write permission on the directories where bad_file
, discard_file
, and log_file
are written.
Note
The file names for control_file
, data_file
, bad_file
, discard_file
, and log_file
must include the extensions .ctl
, .dat
, .bad
, .dsc
, and .log
, respectively. If the provided file name doesn't contain an extension, EDB*Loader assumes the actual file name includes the appropriate extension.
Parameters
dbname
Name of the database containing the tables to load.
port
Port number on which the database server is accepting connections.
host
IP address of the host on which the database server is running.
USERID={ username/password | username/ | username | / }
EDB*Loader connects to the database with username
. username
must be a superuser or a username with the required privileges. password
is the password for username
.
If you omit the USERID
parameter, EDB*Loader prompts for username
and password
. If you specify USERID=username/
, then EDB*Loader either:
Uses the password file specified by environment variable
PGPASSFILE
ifPGPASSFILE
is setUses the
.pgpass
password file (pgpass.conf
on Windows systems) ifPGPASSFILE
isn't setIf you specify
USERID=username
, then EDB*Loader prompts forpassword
. If you specifyUSERID=/
, the connection is attempted using the operating system account as the user name.Note
EDB*Loader ignores the EDB Postgres Advanced Server connection environment variables
PGUSER
andPGPASSWORD
. See the PostgreSQL core documentation for information on thePGPASSFILE
environment variable and the password file.
-c CONNECTION_STRING
connstr=CONNECTION_STRING
The -c
or connstr=
option allows you to specify all the connection parameters supported by libpq. With this option, you can also specify SSL connection parameters or other connection parameters supported by libpq. If you provide connection options such as -d
, -h
, -p
or userid=dbuser/dbpass
separately, they might override the values provided by the -c
or connstr=
option.
CONTROL=control_file
control_file
specifies the name of the control file containing EDB*Loader directives. If you don't specify a file extension, an extension of .ctl
is assumed.
For more information on the control file, see Building the EDB*Loader control file.
DATA=data_file
data_file
specifies the name of the file containing the data to load into the target table. If you don't specify a file extension, an extension of .dat
is assumed. Specifying a data_file
on the command line overrides the INFILE
clause specified in the control file.
For more information about data_file
, see Building the EDB*Loader control file.
BAD=bad_file
bad_file
specifies the name of a file that receives input data records that can't be loaded due to errors. Specifying a bad_file
on the command line overrides any BADFILE
clause specified in the control file.
For more information about bad_file
, see Building the EDB*Loader control file.
DISCARD=discard_file
discard_file
is the name of the file that receives input data records that don't meet any table’s selection criteria. Specifying a discard_file
on the command line overrides the DISCARDFILE
clause in the control file.
For more information about discard_file
, see Building the EDB*Loader control file.
DISCARDMAX=max_discard_recs
max_discard_recs
is the maximum number of discarded records that can be encountered from the input data records before terminating the EDB*Loader session. Specifying max_discard_recs
on the command line overrides the DISCARDMAX
or DISCARDS
clause in the control file.
For more information about max_discard_recs
, see Building the EDB*Loader control file.
HANDLE_CONFLICTS={ FALSE | TRUE }
If any record insertion fails due to a unique constraint violation, EDB*Loader aborts the entire operation. You can instruct EDB*Loader to instead move the duplicate record to the BAD
file and continue processing by setting HANDLE_CONFLICTS
to TRUE
. This behavior applies only if indexes are present. By default, HANDLE_CONFLICTS
is set to FALSE
.
Setting HANDLE_CONFLICTS
to TRUE
isn't supported with direct path loading. If you set this parameter to TRUE
when direct path loading, EDB*Loader throws an error.
LOG=log_file
log_file
specifies the name of the file in which EDB*Loader records the results of the EDB*Loader session.
If you omit the LOG
parameter, EDB*Loader creates a log file with the name control_file_base.log
in the directory from which edbldr
is invoked. control_file_base
is the base name of the control file used in the EDB*Loader session. The operating system account enterprisedb
must have write permission on the directory where the log file is written.
PARFILE=param_file
param_file
specifies the name of the file that contains command line parameters for the EDB*Loader session. You can specify command line parameters listed in this section in param_file
instead of on the command line. Exceptions are the -d
, -p
, and -h
options, and the PARFILE
parameter.
Any parameter given in param_file
overrides the same parameter supplied on the command line before the PARFILE
option. Any parameter given on the command line that appears after the PARFILE
option overrides the same parameter given in param_file
.
!!! Note
Unlike other EDB*Loader files, there's no default file name or extension assumed for param_file
. However, by Oracle SQL*Loader convention, .par
is typically used as an extension. It isn't required.
DIRECT= { FALSE | TRUE }
If DIRECT
is set to TRUE
EDB*Loader performs a direct path load instead of a conventional path load. The default value of DIRECT
is FALSE
.
Don't set DIRECT=true
when loading the data into a replicated table. If you're using EDB*Loader to load data into a replicated table and set DIRECT=true
, indexes might omit rows that are in a table or might contain references to rows that were deleted. EnterpriseDB doesn't support direct inserts to load data into replicated tables.
For information about direct path loads, see Direct path load.
FREEZE= { FALSE | TRUE }
Set FREEZE
to TRUE
to copy the data with the rows frozen
. A tuple guaranteed to be visible to all current and future transactions is marked as frozen to prevent transaction ID wraparound. For more information about frozen tuples, see the PostgreSQL core documentation.
You must specify a data-loading type of TRUNCATE
in the control file when using the FREEZE
option. FREEZE
isn't supported for direct loading.
By default, FREEZE
is FALSE
.
ERRORS=error_count
error_count
specifies the number of errors permitted before aborting the EDB*Loader session. The default is 50
.
PARALLEL= { FALSE | TRUE }
Set PARALLEL
to TRUE
to indicate that this EDB*Loader session is one of a number of concurrent EDB*Loader sessions participating in a parallel direct path load. The default value of PARALLEL
is FALSE
.
When PARALLEL
is TRUE
, the DIRECT
parameter must also be set to TRUE
.
For more information about parallel direct path loads, see Parallel direct path load.
ROWS=n
n
specifies the number of rows that EDB*Loader commits before loading the next set of n
rows.
SKIP=skip_count
Number of records at the beginning of the input data file to skip before loading begins. The default is 0
.
SKIP_INDEX_MAINTENANCE= { FALSE | TRUE }
If set to TRUE
, index maintenance isn't performed as part of a direct path load, and indexes on the loaded table are marked as invalid. The default value of SKIP_INDEX_MAINTENANCE
is FALSE
.
During a parallel direct path load, target table indexes aren't updated. They are marked as invalid after the load is complete.
You can use the REINDEX
command to rebuild an index. For more information about the REINDEX
command, see the PostgreSQL core documentation.
edb_resource_group=group_name
group_name
specifies the name of an EDB Resource Manager resource group to which to assign the EDB*Loader session.
Any default resource group that was assigned to the session is overridden by the resource group given by the edb_resource_group
parameter specified on the edbldr
command line. An example of such a group is a database user running the EDB*Loader session who was assigned a default resource group with the ALTER ROLE ... SET
edb_resource_group
command.
Examples
This example invokes EDB*Loader using a control file named emp.ctl
to load a table in database edb
. The file is located in the current working directory.
In this example, EDB*Loader prompts for the user name and password since they are omitted from the command line. In addition, the files for the bad file and log file are specified with the BAD
and LOG
command line parameters.
This example runs EDB*Loader using a parameter file located in the current working directory. The SKIP
and ERRORS
parameter default values are specified in the parameter file in addition the CONTROL
, BAD
, and LOG
files. The parameter file, emp.par
, contains the following:
Invoke EDB*Loader with the parameter file:
This example invokes EDB*Loader using a connstr=
option that uses the emp.ctl
control file located in the current working directory to load a table in a database named edb
:
This example invokes EDB*Loader using a normal user. For this example, one empty table bar
is created and a normal user bob
is created. The bob
user is granted all privileges on the table bar
. The CREATE TABLE command creates the empty table. The CREATE USER command creates the user and the GRANT command gives required privileges to the user bob
on the bar
table:
The control file and data file:
Invoke EDB*Loader:
Exit codes
When EDB*Loader exits, it returns one of the following codes:
Exit code | Description |
---|---|
0 | Indicates that all rows loaded successfully. |
1 | Indicates that EDB*Loader encountered command line or syntax errors or aborted the load operation due to an unrecoverable error. |
2 | Indicates that the load completed, but some (or all) rows were rejected or discarded. |
3 | Indicates that EDB*Loader encountered fatal errors (such as OS errors). This class of errors is equivalent to the FATAL or PANIC severity levels of PostgreSQL errors. |
Direct path load
During a direct path load, EDB*Loader writes the data directly to the database pages, which is then synchronized to disk. The insert processing associated with a conventional path load is bypassed, resulting in a performance improvement.
Bypassing insert processing reduces the types of constraints on the target table. The following types of constraints are permitted on the target table of a direct path load:
- Primary key
- Not null constraints
- Indexes (unique or non-unique)
The following are restrictions on the target table of a direct path load:
- Triggers aren't permitted.
- Check constraints aren't permitted.
- Foreign key constraints on the target table referencing another table aren't permitted.
- Foreign key constraints on other tables referencing the target table aren't permitted.
- You must not partition the table.
- Rules can exist on the target table, but they aren't executed.
Note
Currently, a direct path load in EDB*Loader is more restrictive than in Oracle SQL*Loader. The preceding restrictions don't apply to Oracle SQL*Loader in most cases. The following restrictions apply to a control file used in a direct path load:
- Multiple table loads aren't supported. You can specify only one
INTO TABLE
clause in the control file. - You can't use SQL expressions in the data field definitions of the
INTO TABLE
clause. - The
FREEZE
option isn't supported for direct path loading.
To run a direct path load, add the DIRECT=TRUE
option:
Parallel direct path load
You can further improve the performance of a direct path load by distributing the loading process over two or more sessions running concurrently. Each session runs a direct path load into the same table.
Since the same table is loaded from multiple sessions, the input records to load into the table must be divided among several data files. This way, each EDB*Loader session uses its own data file, and the same record isn't loaded into the table more than once.
The target table of a parallel direct path load is under the same restrictions as a direct path load run in a single session.
The following are the restrictions on the target table of a direct path load:
- Triggers aren't permitted.
- Check constraints aren't permitted.
- Foreign key constraints on the target table referencing another table aren't permitted.
- Foreign key constraints on other tables referencing the target table aren't permitted.
- You must not partition the table.
- Rules can exist on the target table, but they aren't executed.
In addition, you must specify the APPEND
clause in the control file used by each EDB*Loader session.
To run a parallel direct path load, run EDB*Loader in a separate session for each participant of the parallel direct path load. You must include the DIRECT=TRUE
and PARALLEL=TRUE
parameters when invoking each such EDB*Loader session.
Each EDB*Loader session runs as an independent transaction. Aborting and rolling back changes of one of the parallel sessions doesn't affect the loading done by the other parallel sessions.
Note
In a parallel direct path load, each EDB*Loader session reserves a fixed number of blocks in the target table using turns. Some of the blocks in the last allocated chunk might not be used, and those blocks remain uninitialized. A later use of the VACUUM
command on the target table might show warnings about these uninitialized blocks, such as the following:
This behavior is expected and doesn't indicate data corruption.
Indexes on the target table aren't updated during a parallel direct path load. They are therefore marked as invalid after the load is complete. You must use the REINDEX
command to rebuild the indexes.
This example shows the use of a parallel direct path load on the emp
table.
Note
If you attempt a parallel direct path load on the sample emp
table provided with EDB Postgres Advanced Server, you must first remove the triggers and constraints referencing the emp
table. In addition, the primary key column, empno
, was expanded from NUMBER(4)
to NUMBER
in this example to allow for inserting more rows.
This is the control file used in the first session:
You must specify the APPEND
clause in the control file for a parallel direct path load.
This example invokes EDB*Loader in the first session. You must specify the DIRECT=TRUE
and PARALLEL=TRUE
parameters.
The control file used for the second session appears as follows. It's the same as the one used in the first session, but it uses a different data file.
This control file is used in a second session:
EDB*Loader displays the following message in each session when the load operation completes:
This query shows that the index on the emp table was marked INVALID
:
Note
user_indexes
is the view of indexes compatible with Oracle databases owned by the current user.
Queries on the emp
table don't use the index unless you rebuild it using the REINDEX
command:
A later query on user_indexes
shows that the index is now marked as VALID
:
Remote loading
EDB*Loader supports a feature called remote loading. In remote loading, the database containing the table to load is running on a database server on a host different from the one where EDB*Loader is invoked with the input data source.
This feature is useful if you have a large amount of data to load, and you don't want to create a large data file on the host running the database server.
In addition, you can use the standard input feature to pipe the data from the data source, such as another program or script, directly to EDB*Loader. EDB*Loader then loads the table in the remote database. This feature bypasses having to create a data file on disk for EDB*Loader.
Performing remote loading along with using standard input requires the following:
- The
edbldr
program must be installed on the client host on which to invoke it with the data source for the EDB*Loader session. - The control file must contain the clause
INFILE 'stdin'
so you can pipe the data directly into EDB*Loader’s standard input. For information on theINFILE
clause and the EDB*Loader control file, see Building the EDB*Loader control file. - All files used by EDB*Loader, such as the control file, bad file, discard file, and log file, must reside on or be created on the client host on which
edbldr
is invoked. - When invoking EDB*Loader, use the
-h
option to specify the IP address of the remote database server. For more information, see Invoking EDB*Loader. - Use the operating system pipe operator (
|
) or input redirection operator (<
) to supply the input data to EDB*Loader.
This example loads a database running on a database server at 192.168.1.14
using data piped from a source named datasource
:
This example also shows how you can use standard input:
Updating a table with a conventional path load
You can use EDB*Loader with a conventional path load to update the rows in a table, merging new data with the existing data. When you invoke EDB*Loader to perform an update, the server searches the table for an existing row with a matching primary key:
- If the server locates a row with a matching key, it replaces the existing row with the new row.
- If the server doesn't locate a row with a matching key, it adds the new row to the table.
To use EDB*Loader to update a table, the table must have a primary key. You can't use EDB*Loader to update a partitioned table.
To perform UPDATE
, use the same steps as when performing a conventional path load:
- Create a data file that contains the rows you want to update or insert.
- Define a control file that uses the
INFILE
keyword to specify the name of the data file. For information about building the EDB*Loader control file, see Building the EDB*Loader control file. - Invoke EDB*Loader, specifying the database name, connection information, and the name of the control file. For information about invoking EDB*Loader, see Invoking EDB*Loader.
This example uses the emp
table that's distributed with the EDB Postgres Advanced Server sample data. By default, the table contains:
This control file (emp_update.ctl
) specifies the fields in the table in a comma-delimited list. The control file performs an UPDATE
on the emp
table.
The data that's being updated or inserted is saved in the emp_update.dat
file. emp_update.dat
contains:
Invoke EDB*Loader, specifying the name of the database (edb
), the name of a database user and their associated password, and the name of the control file (emp_update.ctl
):
After performing the update, the emp
table contains:
The rows containing information for the three employees that are currently in the emp
table are updated, while rows are added for the new employees (BAKER
and MILLS
).