Application Schema Upgrades v3.7
In this chapter we discuss upgrading software on a EDB Postgres Distributed cluster and how to minimize downtime for applications during the upgrade.
Overview
EDB Postgres Distributed cluster has two sets of software, the underlying PostgreSQL software or some flavor of it and the PGLogical/BDR software. We will discuss upgrading either or both of these softwares versions to their supported major releases.
To upgrade a EDB Postgres Distributed cluster, the following steps need to be performed on each node:
- plan the upgrade
- prepare for the upgrade
- upgrade the server software
- restart Postgres
- check and validate the upgrade
Upgrade Planning
While BDR 3.6 release supports PostgreSQL 10 and 11 major versions, BDR 3.7 supports PostgreSQL 11, 12 and 13. Please refer to the Compatibility matrix for the full list compatible software. Since BDR 3.7 supports newer PostgreSQL releases, while upgrading from BDR 3.6 to BDR 3.7, it's also possible to upgrade the newer PostgreSQL releases with minimum or no application downtime.
There are broadly two ways to upgrade the BDR version.
- Upgrading one node at a time to the newer BDR version.
- Joining a new node running a newer version of the BDR software and then optionally drop one of the old nodes.
If you are only interested in upgrading the BDR software, any of the two methods can be used. But if you also want to upgrade the PostgreSQL version, then the second method must be used.
Rolling Server Software Upgrades
A rolling upgrade is the process where the below Server Software Upgrade is performed on each node in the BDR Group one after another, while keeping the replication working.
An upgrade to 3.7 is only supported from 3.6, using a specific minimum maintenance release (e.g. 3.6.25). Please consult the Release Notes for the actual required minimum version. So if a node is running with an older 3.6 release, it must first be upgraded to the minimum and can only then be upgraded to 3.7.
Just as with a single-node database, it's possible to stop all nodes, perform the upgrade on all nodes and only then restart the entire cluster. This strategy of upgrading all nodes at the same time avoids running with mixed BDR versions and therefore is the simplest, but obviously incurs some downtime.
During the upgrade process, the application can be switched over to a node which is currently not being upgraded to provide continuous availability of the BDR group for applications.
While the cluster is going through a rolling upgrade, replication happens between mixed versions of BDR3. For example, nodeA will have BDR 3.6.25, while nodeB and nodeC will have 3.7.8. In this state, the replication and group management will use the protocol and features from the oldest version (3.6.25 in case of this example), so any new features provided by the newer version which require changes in the protocol will be disabled. Once all nodes are upgraded to the same version, the new features are automatically enabled.
A EDB Postgres Distributed cluster is designed to be easily upgradeable. Most BDR releases support rolling upgrades, which means running part of the cluster on one release level and the remaining part of the cluster on a second, compatible, release level.
A rolling upgrade starts with a cluster with all nodes at a prior release, then proceeds by upgrading one node at a time to the newer release, until all nodes are at the newer release. Should problems occur, do not attempt to downgrade without contacting Technical Support to discuss and provide options.
An upgrade process may take an extended period of time when the user decides caution is required to reduce business risk, though this should not take any longer than 30 days without discussion and explicit agreement from Technical Support to extend the period of coexistence of two release levels.
In case of problems during upgrade, do not initiate a second upgrade to a newer/different release level. Two upgrades should never occur concurrently in normal usage. Nodes should never be upgraded to a third release without specific and explicit instructions from Technical Support. A case where that might occur is if an upgrade failed for some reason and a Hot Fix was required to continue the current cluster upgrade process to successful conclusion. BDR has been designed and tested with more than 2 release levels, but this cannot be relied upon for production usage except in specific cases.
Rolling Upgrade Using Node Join
The other method of upgrading BDR software, along with or without upgrading the underlying PostgreSQL major version, is to join a new node to the cluster and later drop one of the existing nodes running the older version of the software. Even with this method, some features that are available only in the newer version of the software may remain unavailable until all nodes are finally upgraded to the newer versions.
A new node running this release of BDR 3.7.8 can join a 3.6 cluster, where each node in the cluster is running the latest 3.6.x version of BDR. The joining node may run any of the supported PostgreSQL versions 11-13, but you must not mix the Standard and Enterprise editions. If the older cluster is running a Standard Edition then it's recommended that the new joining node should also run a Standard Edition. Similarly, if the old cluster is running Enterprise Edition, the new joining node should also run the Enterprise Edition.
Care must be taken to not use features that are available only in the newer PostgreSQL versions 12-13, until all nodes are upgraded to the newer and same release of PostgreSQL. This is especially true for any new DDL syntax that may have been added to newer release of PostgreSQL.
Note that bdr_init_physical
makes a byte-by-byte of the source node.
So it cannot be used while upgrading from one major PostgreSQL version
to another. In fact, currently bdr_init_physical
requires that even
BDR version of the source and the joining node is exactly the same. So
it cannot be used for rolling upgrades via joining a new node method. In
all such cases, a logical join must be used.
Upgrading a CAMO-Enabled cluster
CAMO protection requires at least one of the nodes of a CAMO pair to be operational. For upgrades, we recommend to ensure that no CAMO protected transactions are running concurrent to the upgrade, or to use a rolling upgrade strategy, giving the nodes enough time to reconcile in between the upgrades and the corresponding node downtime due to the upgrade.
Upgrade Preparation
BDR 3.7 contains several changes that may affect compatibility with previous releases. These may affect the Postgres configuration, deployment scripts as well as applications using BDR. We recommend to consider and possibly adjust in advance of the upgrade.
Node Management
The bdr.create_node_group()
function has seen a number of changes:
- It is now possible to create sub-groups, resulting in a tree-of-groups structure of the whole EDB Postgres Distributed cluster. Monitoring views were updated accordingly.
- The deprecated parameters
insert_to_update
,update_to_insert
,ignore_redundant_updates
,check_full_tuple
andapply_delay
were removed.
Usebdr.alter_node_set_conflict_resolver()
instead ofinsert_to_update
,update_to_insert
. Thecheck_full_tuple
is no longer needed as it is handled automatically based on table conflict detection configuration.
Conflicts
The configuration of conflict resolution and logging is now copied from join source node to the newly joining node, rather than using defaults on the new node.
The default conflict resolution for some of the conflict types was changed. See (conflicts.md#default-conflict-resolvers) for the new defaults.
The conflict logging interfaces have changed from bdr.alter_node_add_log_config
and bdr.alter_node_remove_log_config
to bdr.alter_node_set_log_config
.
The default conflict logging table is now named bdr.conflict_history
and the
old bdr.apply_log
no longer exists. The new table is partitioned using the
new Autopartition feature of BDR 3.7.
All conflicts are now logged by default to both log file and the conflict table.
Deprecated functions bdr.row_version_tracking_enable()
and
bdr.row_version_tracking_disable()
were removed. Use
bdr.alter_table_conflict_detection()
instead.
Some of the configuration for conflict handling is no longer stored in
pglogical
schema. Any diagnostic queries that were using the pglogical
tables directly will have to switch to appropriate tables in bdr
schema.
Queries using bdr.node_group
, bdr.local_node_summary
, bdr.local_node_summary
or
bdr.node_local_info
will need to use the new columns sub_repsets
and
pub_repsets
instead of replication_sets
.
Removed Or Renamed Settings (GUCs)
The setting report_transaction_id
has been removed and is no longer
known to Postgres. It had been deprecated in the 3.6.x branch already
and the underlying functionality is enabled automatically when needed,
instead. So it's safe to remove report_transaction_id
from your
configuration or reset it via ALTER SYSTEM
even on 3.6.22 (and
newer). Otherwise, Postgres refuses to start after the upgrade and
will report an "unrecognized configuration parameter".
The GUC to enable CAMO has moved from Postgres to BDR and got renamed
from pg2q.enable_camo
to bdr.enable_camo
.
Server Software Upgrade
The upgrade of BDR software on individual nodes happens in-place. There is no need for backup and restore when upgrading the BDR extension.
The first step in the upgrade is to install the new version of the BDR packages, which will install both the new binary and the extension SQL script. This step depends on the operating system used.
Restart Postgres
Upgrading the binary and extension scripts by itself does not upgrade BDR in the running instance of PostgreSQL. To do that, the PostgreSQL instance needs to be restarted so that the new BDR binary can be loaded (the BDR binary is loaded at the start of the PostgreSQL server). After that, the node is upgraded. The extension SQL upgrade scripts are executed automatically as needed.
Warning
It's important to never run the ALTER EXTENSION ... UPDATE
command before the
PostgreSQL instance is restarted, as that will only upgrade the SQL-visible
extension but keep the old binary, which can cause unpredictable behaviour or
even crashes. The ALTER EXTENSION ... UPDATE
command should never be needed;
BDR3 maintains the SQL-visible extension automatically as needed.
Upgrade Check and Validation
After this procedure, your BDR node is upgraded. You can verify the current version of BDR3 binary like this:
The upgrade of BDR3 will usually also upgrade the version of pglogical 3 installed in the system. The current version of pglogical can be checked using:
Always check the monitoring after upgrade of a node to confirm that the upgraded node is working as expected.
Database Encoding
We recommend using UTF-8
encoding in all replicated databases.
BDR does not support replication between databases with different
encoding. There is currently no supported path to upgrade/alter encoding.
Similar to the upgrade of BDR itself, there are two approaches to upgrading the application schema. The simpler option is to stop all applications affected, preform the schema upgrade and restart the application upgraded to use the new schema variant. Again, this imposes some downtime.
To eliminate this downtime, BDR offers ways to perform a rolling application schema upgrade as documented in the following section.
Rolling Application Schema Upgrades
By default, DDL will automatically be sent to all nodes. This can be controlled manually, as described in DDL Replication, which could be used to create differences between database schemas across nodes. BDR is designed to allow replication to continue even while minor differences exist between nodes. These features are designed to allow application schema migration without downtime, or to allow logical standby nodes for reporting or testing.
Warning
Application Schema Upgrades are managed by the user, not by BDR. Careful scripting will be required to make this work correctly on production clusters. Extensive testing is advised.
Details of this are covered here Replicating between nodes with differences.
When one node runs DDL that adds a new table, nodes that have not
yet received the latest DDL will need to cope with the extra table.
In view of this, the appropriate setting for rolling schema upgrades
is to configure all nodes to apply the skip
resolver in case of a
target_table_missing
conflict. This must be performed before any
node has additional tables added, and is intended to be a permanent
setting.
This is done with the following query, that must be executed
separately on each node, after replacing node1
with the actual
node name:
When one node runs DDL that adds a column to a table, nodes that have not
yet received the latest DDL will need to cope with the extra columns.
In view of this, the appropriate setting for rolling schema
upgrades is to configure all nodes to apply the ignore
resolver in
case of a target_column_missing
conflict. This must be performed
before one node has additional columns added and is intended to be a
permanent setting.
This is done with the following query, that must be executed
separately on each node, after replacing node1
with the actual
node name:
When one node runs DDL that removes a column from a table, nodes that
have not yet received the latest DDL will need to cope with the missing column.
This situation will cause a source_column_missing
conflict, which uses
the use_default_value
resolver. Thus, columns that neither
accept NULLs nor have a DEFAULT value will require a two step process:
- Remove NOT NULL constraint or add a DEFAULT value for a column on all nodes.
- Remove the column.
Constraints can be removed in a rolling manner. There is currently no supported way for coping with adding table constraints in a rolling manner, one node at a time.
When one node runs a DDL that changes the type of an existing column, depending on the existence of binary coercibility between the current type and the target type, the operation may not rewrite the underlying table data. In that case, it will be only a metadata update of the underlying column type. Rewrite of a table is normally restricted. However, in controlled DBA environments, it is possible to change the type of a column to an automatically castable one by adopting a rolling upgrade for the type of this column in a non-replicated environment on all the nodes, one by one. More details are provided in the ALTER TABLE section.