Loading tables from an external data source (offline snapshot) v7
You might want to initially load your target tables (subscription tables of a single-master replication system, or non-MDN nodes of a multi-master replication system) using a method other than the snapshot replication functionality of Replication Server. This method is referred to as using an offline snapshot.
For example, you might at first load the tables by running the Migration Toolkit from the command line or by using a backup from an external data source. When you load the target tables using an offline snapshot, you must take into account special preparations for the following deviations from the default target table creation and loading process:
- In the typical default scenario, Replication Server creates the target table definitions when you define the subscription in a single-master replication system or add another primary node in a multi-master replication system. When using an offline snapshot, creating the target table definitions is your responsibility. You must therefore prevent Replication Server from creating the target table definitions.
- In the typical default scenario, Replication Server performs synchronization replication using batches of SQL statements. If any statement in a batch results in an error, all statements in the batch are rolled back. When using an offline snapshot, the external data source used to load the target tables might already have transactions applied to it that are also recorded in the shadow tables of the source tables. In this case, you must perform the first synchronization replication in non-batch mode. This condition can result in a statement failure in certain cases.
Non-batch mode synchronization
Synchronization replications are done in batches of updates, each batch committed in a separate transaction. If any single update in a batch fails, all the updates in the batch are rolled back.
This process has the following implications.
Prior to and during the time when the offline snapshot is in progress, there might be updates to the source tables, which are recorded in the source tables’ shadow tables. After the offline snapshot completes, there might be more updates to the source tables that are also recorded in the shadow tables.
Since Replication Server does not know about the external data source used to load the target tables, Replication Server doesn't know whether any of the updates made to the source tables during or after the offline snapshot were already included in the data used to load the target tables.
As a result, the shadow tables might include a mixture of duplicate updates that were already applied to the target tables as well as new updates that weren't applied to the target tables.
If you then perform synchronization replication, the publication server attempts to apply all updates recorded in the shadow tables in batches.
If one of the updates was inserting a new row and this new row is already in the target table loaded from the offline snapshot, a duplicate key error results when the publication server attempts to apply the batch containing the INSERT
statement for this row. The duplicate key error forces the rollback of the entire batch. This causes the exclusion of updates in the batch that might not yet have been carried over to the target tables. The source tables and target tables are now inconsistent, since there were updates to the source tables that weren't applied to the target tables.
Note
The effects of applying UPDATE
and DELETE
statements in the batch to a target table that was already changed by these updates doesn't cause the same problem as repeated application of INSERT
statements. The UPDATE
statement changes the row to the same values a second time. When a DELETE
statement doesn't affects any rows, this isn't considered an error by the database server. No rollback of the batch occurs.
The solution to the potential rollback of a batch is to apply the shadow table updates in non-batch mode. That is, commit each SQL statement individually. In that way, if inserting a row fails due to a duplicate key error, that statement alone is rolled back. The error doesn't affect the other shadow table updates that must be applied since all updates are enclosed in their own, individual transactions.
The batchInitialSync
configuration option controls whether the first synchronization replication occurs in batch or non-batch mode. Suppose you're using an offline snapshot in an active replication system where updates are occurring to the source tables. Transactions are thus accumulating in the shadow tables for the trigger-based method. In this case, we recommend setting batchInitialSync
to false
to perform the first synchronization replication in non-batch mode.
Note
You can't use an offline snapshot to add a subscription or a primary node to an active replication system that uses the log-based method. For the log-based method, you can use offline snapshots only to first configure the system. You can't use them to update the system with additional nodes after the publication database or primary node is actively receiving transactions.
Suppose you're using offline snapshots to first create the entire replication system that has yet to be activated. The content of the offline snapshots are all assumed to be consistent for the source and target tables. You can then leave batchInitialSync
with its default setting of true
. It is assumed that the first synchronization replication will not apply any duplicate updates.
Offline snapshot configuration options
The following are the configuration options that you need to modify when using an offline snapshot.
Note
These options apply only to the publication server.
offlineSnapshot
You must set the offlineSnapshot
to true
before creating the subscription for a single-master replication system or before adding the primary node for a multi-master replication system.
offlineSnapshot={true | false}
The default value is false
.
When set to true
, the offlineSnapshot
option prevents the usual creation of the subscription schema and table definitions when the subscription is defined in a single-master replication system. It is assumed that you're creating the subscription table definitions and loading them from an external source other than the publication.
When adding the primary node in a multi-master replication system, leave the Replicate Publication Schema and Perform Initial Snapshot boxes cleared (see Creating more primary nodes).
When offlineSnapshot
is set to true
, this configuration has the direct effect in the control schema by setting column has_initial_snapshot
to a value of O
. This setting indicates an offline snapshot is used for the target subscription or primary node represented by the row. Column has_initial_snapshot
is set in table _edb_replicator_pub.xdb_publication_subscriptions for a single-master replication system and in table _edb_replicator_pub.xdb_mmr_pub_group for a multi-master replication system.
After the first replication completes to the target subscription or primary node, has_initial_snapshot
is changed to Y
by Replication Server.
The setting of has_initial_snapshot
influences the behavior of the batchInitialSync
option.
batchInitialSync
The batchInitialSync
option controls whether the first synchronization after loading the target tables from an offline snapshot is done in batch mode (the default) or non-batch mode.
Set the batchInitialSync
option to false
to perform synchronization replication in non-batch mode.
You must set the offlineSnapshot
configuration option to true
prior to creating the subscription or adding another primary node. A non-batch mode synchronization occurs only if batchInitialSync
is false
and the has_initial_snapshot
column in the control schema is set to a value of O
as described for the offlineSnapshot
option.
batchInitialSync={true | false}
The default value is true
.
Single-master replication offline snapshot
You can use an offline snapshot to first load the subscription tables of a single-master replication system. For a publication that's intended to have multiple subscriptions, you can create some of the subscriptions using the default Replication Server snapshot replication process as described in Performing snapshot replication. You can create other subscriptions from an offline snapshot.
To create a subscription from an offline snapshot:
Register the publication server, add the publication database definition, and create the publication as described in Creating a publication.
Register the subscription server and add the subscription database definition as described in Registering a subscription server and Adding a subscription database.
Note
You must perform steps 3 and 4 before creating the subscription. You can repeat steps 5 through 9 each time you want to create another subscription from an offline snapshot.
Modify the publication server configuration file if these options aren't already set as described by the following:
- Change the
offlineSnapshot
option totrue
. When you restart the publication server or reload the publication server's configuration via reloadconf,offlineSnapshot
set totrue
has two effects. One is that creating a subscription doesn't create the schema and subscription table definitions in the subscription database as is done with the default setting. The other is that creating a subscription sets a column in the control schema indicating an offline snapshot is used to load this subscription. - Set the
batchInitialSync
option to the appropriate setting for your situation as discussed at the end of Non-batch mode synchronization.
- Change the
If you modified the publication server configuration file in Step 3, reload configuration of the publication server. See "Reloading the Publication or Subscription Server Configuration File (reloadconf)" for directions on reloading the publication server's configuration.
In the subscription database, create the schema and the subscription table definitions, and load the subscription tables from your offline data source. The subscription database user name used in Adding a subscription database must have full privileges over the database objects created in this step. Also review the beginning of Adding a subscription database regarding the rules as to how Replication Server creates the subscription definitions from the publication for each database type. You must follow these same conventions when you create the target definitions manually.
Add the subscription as described in Adding a subscription.
Perform an on-demand synchronization replication. See Performing synchronization replication to learn how to perform an on-demand synchronization replication.
If you aren't planning to load any other subscriptions using an offline snapshot at this time, change the
offlineSnapshot
option back tofalse
and thebatchInitialSync
option totrue
in the publication server configuration file.If you modified the publication server configuration file in step 8, reload configuration of the publication server.
Multi-master replication offline snapshot
You can use an offline snapshot to first load the primary nodes of a multi-master replication system. You can load some of the primary nodes using the Replication Server snapshot replication functionality when defining the primary node as described in Creating more primary nodes or by using an on-demand snapshot as described in Performing snapshot replication. You can load other primary nodes from an offline snapshot.
Note
Offline snapshots aren't supported for a multi-master replication system that's actively in use. Any changes on an active primary node are lost during the offline snapshot process of dumping or restoring the data of another node.
To create a primary node from an offline snapshot:
Register the publication server, add the primary definition node, and create the publication as described in Creating a publication.
Note
You must perform the steps 3 and 4 before adding a primary node to be loaded by an offline snapshot. You can repeat Steps 5 through 10 each time you want to create another primary node from an offline snapshot.
Be sure there's no schedule defined on the replication system. If there is, remove the schedule until you complete this process. See Removing a schedule for details.
Modify the publication server configuration file so the options are set as follows:
- Set the
offlineSnapshot
option totrue
. When you restart the publication server or reload the publication server's configuration via reloadconf, this setting has the effect that adding a primary node sets a column in the control schema indicating an offline snapshot is used to load this primary node. - Set the
batchInitialSync
option to the appropriate setting for your situation as discussed at the end of Non-batch mode synchronization.
- Set the
If you modified the publication server configuration file in step 3, reload configuration of the publication server. See "Reloading the Publication or Subscription Server Configuration File (reloadconf)" for directions to reload the publication server's configuration.
In the database to use as the new primary node, create the schema and the table definitions, and load the tables from your offline data source.
Add the primary node as described in Creating more primary nodes with the options Replicate Publication Schema and Perform Initial Snapshot cleared.
Perform an initial on-demand synchronization. See Performing synchronization replication to learn how to perform an on demand-synchronization.
If you aren't planning to load any other primary nodes using an offline snapshot at this time, change the
offlineSnapshot
option back tofalse
and thebatchInitialSync
option totrue
in the publication server configuration file.If you modified the publication server configuration file in step 8, reload configuration of the publication server.
Add the schedule again if you removed it. See Creating a schedule to learn how to create a schedule.