Performing data validation v7
The current working directory from which you invoke the Data Validator script runValidation.sh
(runValidation.bat
for Windows) must be the bin
subdirectory containing the script (that is, XDB_HOME/bin
).
For example, if the Replication Server is installed in its default directory location, then issue the following command before invoking the Data Validator:
cd /usr/edb/xdb/bin
Similarly for Windows hosts, issue the following:
cd C:\Program Files\edb\EnterpriseDB-xDBReplicationServer\bin
The general command format for invoking the Data Validator is the following:
schema_name
is the name of the schema in the source database containing the tables to validate.
For Windows hosts, the command format is the following:
The following option displays the Data Validator version:
On Linux the version is displayed as follows:
On Windows the version is displayed as follows:
The following option displays the help information.
./runValidation.sh { –h | --help }
For example,
New enhancement
The --use-ora-case option is available in Replication Server 7.1 and later.
Note
Data Validator does not support column mapping. With custom column mapping, table columns in the source and target databases can have different data types. As a result, the Data Validator is not able to validate the data when column mapping is applied.
The general syntax for all options except for --version
and --help
is shown by the following:
For clarity, the syntax diagram shows only the single-character form of the option. The description of each option lists both the single-character and multi-character forms.
Specifying any database connection option (-sdbms
through -tpw
listed in the syntax diagram) overrides the corresponding parameter in the datavalidator.properties
file. See Installing and configuring the data validator for information on the datavalidator.properties
file.
Options
-ss, --source-schema schema
The schema of the source database containing the tables to compare against the target database.
-ts, --target-schema schema
The schema of the target database containing the tables to compare against the source database. If omitted, the schema of the target database is the same schema as specified for the source database with the -ss
option.
-it, --include-tables table_1 [,table_2 ] ...
The tables in the source schema to include for comparison. If omitted, all tables in the source schema are compared against tables in the target schema. The exception is the tables excluded from comparison using the -et
option. Don't use white space between the comma and table names.
-et, --exclude-tables table_1 [,table_2 ] ...
The tables in the source schema to exclude from comparison. If omitted, only those tables specified with the -it
option are included for comparison. If both the -it
and -et
options are omitted, all source schema tables are included for comparison. Don't use white space between the comma and table names.
-srs, --skip-rowsonlyin-source { true | false }
When you specify true
, the logging of differences for rows that exist only in the source database table are skipped. The default is false
.
-srt, --skip-rowsonlyin-target { true | false }
When you specify true
, the logging of differences for rows that exist only in the target database table are skipped. The default is false
.
-srb, --skip-rowsin-both { true | false }
When you specify true
, the logging of differences for rows:
- That exist both in the source and target database tables
- Have the same primary key
- Have different non-primary key values
are skipped. The default is false
.
-ld, --logging-dir log_directory_path
Directory path to create and store the Data Validator log and diff files. If log_directory_path
doesn't exist, Data Validator attempts to create it. If a full directory path isn't specified log_directory_path
is created or assumed to be located relative to the XDB_HOME/bin
subdirectory where the runValidation.sh
script is invoked. (That is, the logs
directory is XDB_HOME/bin/log_directory_path
.) Be sure the operating system account used to invoke the runValidation.sh
script has the privileges to create the directory if needed or to create files in the specified directory. The default is the XDB_HOME/bin/logs
directory.
-ds, --display-summary { true | false }
Specify true
to display only the Data Validator summary. This value omits the source and target database connection information as well as the detailed breakdown of the results by source database table. Specify false
to display all of the Data Validator results. The type and amount of information that is displayed at the command line console when the Data Validator is invoked is the same information that is also stored in the log file for that run. The default is false
.
-sdbms, --source-dbms database_type
The type of the source database server. Supported types are oracle
, enterprisedb
, sqlserver
, sybase
, and mysql
.
-sh, --source-host host
The IP address or server name of the host where the source database server is running.
-sp, --source-port port
The port number on which the source database server is listening for connections.
-sdb, --source-database dbname
The name of the source database.
-su, --source-user user
The database user name for connecting to the source database.
-spw, --source-password password
The password of the source database user in unencrypted form.
-tdbms, --target-dbms database_type
The type of the target database server. Supported types are enterprisedb
and oracle
.
-th, --target-host host
The IP address or server name of the host whereh the target database server is running.
-tp, --target-port port
The port number on which the target database server is listening for connections.
-tdb, --target-database dbname
The name of the target database.
-tu, --target-user user
The database user name for connecting to the target database.
-tpw, --target-password password
The password of the target database user in unencrypted form.
-bs, --batch-size row_count
The -bs
option specifies the number of rows to group in a batch to use for comparison across the source and target database tables. For example, if a table contains 1000 rows, then a -bs
setting of 100 requires 10 batch iterations to complete the comparison across the source and target databases. The Data Validator reads 100 rows, both from the source and target tables, and adds them in source and target buffers. The validation thread then reads the 100 rows from the source and target buffers and performs the comparison. It then moves to read and prepare the next 100 rows for comparison, and so on. The actual database round trips required to bring in 100 rows from the database depends on the -fs
option for the fetch size. For example, an -fs
setting of 100 needs just one round trip and an -fs
setting of 10 requires 10 database round trips.
-fs, --fetch-size row_count
Performing data validation for tables that are quite large can cause the Data Validator to stop with an out-of-heap-space error when using the default fetch size of 5000 rows. Use the -fs
option to specify a smaller fetch size to help avoid the out-of-heap-space issue. The result set iteration brings in as many rows as represented by the row_count
value in a single database round trip.
Examples
The following examples use an Oracle source database and an EDB Postgres Advanced Server target database to compare the tables in schema EDB
on Oracle against the tables in schema public
in EDB Postgres Advanced Server.
The following lists the tables in schema EDB along with the content of tables DEPT
and EMP
in the Oracle source database:
The following lists the tables in schema public along with the content of tables dept and emp in the EDB Postgres Advanced Server edb
database:
Note the following differences:
- The Oracle
EDB
schema contains one moew table namedORATAB
that doesn't exist in the EDB Postgres Advanced Server public schema. - The Oracle DEPT table contains one extra row with
DEPTNO 50
that doesn't exist in the EDB Postgres Advanced Server dept table. - The rows in the EMP table with
EMPNO
values9001
and9002
have column values that differ between the Oracle and EDB Postgres Advanced Server tables. - In this example, the JOBHIST table contains identical rows for both the Oracle and Advanced Server tables.
The content of the datavalidator.properties
file is set as follows:
The following example compares all tables in the Oracle EDB
schema against the EDB Postgres Advanced Server public schema.
The Data Validator log files are created in directory /home/user/datavalidator_logs
as specified with the -ld
option. The operating system account used to invoke the runValidation.sh
script has write access to the /home/user
directory so the Data Validator can create the datavalidator_logs
subdirectory.
The Data Validator output indicates the following:
- There is one error in the DEPT table (the missing row).
- There are two errors in the EMP table (the two rows with mismatching column values).
- The JOBHIST table contains no errors.
- The ORATAB table doesn't exist on the target database.
The following shows the files created in the Data Validator logs
directory:
The log file contains the same content as displayed when the Data Validator is invoked. The diff file compares the differences where errors were detected.
The following example includes only tables dept and emp with the -it
option when comparing the Oracle EDB schema against the Advanced Server public schema.
The following example excludes tables ORATAB and jobhist with the -et
option when comparing the Oracle EDB schema against the EDB Postgres Advanced Server public schema. The -ds true
option results in the display of only the Data Validator summary.
For this run, the corresponding log file contains only the Data Validator summary, omitting the source and target database connection information along with the error breakdown by table.