Release Notes for pglogical 3.7 v3.7

pglogical 3.7.20

This is a maintenance release for PGLogical 3.7. It is equivalent to 3.7.19, but still gets a release and a version bump to match the BDR version number.

Upgrades

This release supports upgrading from following versions of pglogical:

  • 3.7.9 and higher
  • 3.6.29 and higher
  • 2.4.0 and 2.4.1

pglogical 3.7.19

This is a maintenance release for pglogical 3.7 which includes minor improvements as well as fixes for issues identified previously.

Resolved Issues

  • Adjust docs related to drop_subscription (RT86700, BDR-2705)
    The function accepts 3 arguments, specify it in the documentation.

Upgrades

This release supports upgrading from following versions of pglogical:

  • 3.7.9 and higher
  • 3.6.29 and higher
  • 2.4.0 and 2.4.1

pglogical 3.7.18

This is a maintenance release for pglogical 3.7 which includes minor improvements as well as fixes for issues identified previously.

Resolved Issues

  • Add Windows specific workaround for wal_sender_timeout

  • Resolve hang related to standby_slots_min_confirmed (BDR-2445, RT80692)
    Allow the WAl sender to proceed without the need to wait for the timeout after a Subscriber Only node is parted, while using standby_slot_names.

  • Don't replicate TRUNCATE as global message (BDR-2821, RT87453)
    The TRUNCATE command now takes the replication set into account.

Upgrades

This release supports upgrading from following versions of pglogical:

  • 3.7.9 and higher
  • 3.6.29 and higher
  • 2.4.0 and 2.4.1

pglogical 3.7.17

This is a maintenance release for pglogical 3.7 which includes minor improvements as well as fixes for issues identified previously.

Resolved Issues

  • Fix spurious pglogical receiver timeouts on idle connections (RT82315)
    When there is nothing to replicate and apply, the pglogical receiver may misinterpret the state as timeout and exit with error "ERROR: 08006: terminating pglogical receiver due to timeout". Correct this misinterpretation to prevent spurious restarts of the receiver and writer processes.

Upgrades

This release supports upgrading from following versions of pglogical:

  • 3.7.9 and higher
  • 3.6.29 and higher
  • 2.4.0 and 2.4.1

pglogical 3.7.16

This is a maintenance release for pglogical 3.7 which includes minor improvements as well as fixes for issues identified previously.

Resolved Issues

  • Keep the lock_timeout as configured on non-CAMO-partner BDR nodes (BDR-1916)
    A CAMO partner uses a low lock_timeout when applying transactions from its origin node. This was inadvertently done for all BDR nodes rather than just the CAMO partner, which may have led to spurious lock_timeout errors on pglogical writer processes on normal BDR nodes.

  • Prevent walsender processes spinning when facing lagging standby slots (RT80295, RT78290)
    Correct signaling to reset a latch so that a walsender process does consume 100% of a CPU in case one of the standby slots is lagging behind.

  • Provide a proper error when attempting to use pglogical.use_spi (RT76368)

  • Reduce log information when switching between writer processes (BDR-2196)

  • Eliminate a memory leak when replicating partitioned tables (RT80981, BDR-2194)

Upgrades

This release supports upgrading from following versions of pglogical:

  • 3.7.9 and higher
  • 3.6.29 and higher
  • 2.4.0 and 2.4.1

pglogical 3.7.15

This is a maintenance release for pglogical 3.7 which includes minor improvements as well as fixes for issues identified previously.

Improvements

  • Add pglogical.max_worker_backoff_delay (BDR-1767) This changes the handling of the backoff delay to exponentially increase from pglogical.min_worker_backoff_delay to pglogical.max_worker_backoff_delay in presence of repeated errors. This reduces log spam and in some cases also prevents unnecessary connection attempts.

Resolved Issues

  • For the view pglogical.subscriptions, ensure the correct value is displayed in the provider_dsn column (BDR-1859)
    Previously the subscriber node's DSN was being displayed.

  • Fix handling of wal_receiver_timeout (BDR-1848)
    The wal_receiver_timeout has not been triggered correctly due to a regression in BDR 3.7 and 4.0.

  • Limit the pglogical.standby_slot_names check when reporting flush position only to physical slots (RT77985, RT78290)
    Otherwise flush progress is not reported in presence of disconnected nodes when using pglogical.standby_slot_names.

  • Confirm LSN of LCR slot progress records when recovering LCR segments (BDR-1264)

  • Fix replication of data types created during bootstrap (BDR-1784)

Upgrades

This release supports upgrading from following versions of pglogical:

  • 3.7.9 and higher
  • 3.6.29 and higher
  • 2.4.0 and 2.4.1

pglogical 3.7.14

This is a maintenance release for pglogical 3.7 which includes minor improvements as well as fixes for issues identified previously.

Improvements

  • Implement buffered read for LCR segment file (BDR-1422)
    This should reduce both I/O operations done by decoding worker as well as the amount of signalling between the decoding worker and wal senders which results in improved performance when decoding worker is used.

Resolved Issues

  • Invalidate output plugin's replication set cache when published replication sets change (BDR-1715)
    This ensures that published replicated sets are always up to date when user changes what should be published on node level. Before this fix, the old settings would have been applied until downstream has reconnected. This primarily affects BDR in situations where the bdr.alter_node_replication_sets was called on multiple nodes in quick succession.

  • Don't try to retry transaction replication when writer receives SIGTERM or SIGINT
    This would unnecessarily hold back the shutdown of postgres when parallel apply is enabled.

  • Log correct error message during initial schema synchronization error (BDR-1598)
    Previously we'd always log ERROR with message "Success" which is not helpful for troubleshooting.

Upgrades

This release supports upgrading from following versions of pglogical:

  • 3.7.9 and higher
  • 3.6.29 and higher
  • 2.4.0
  • 2.4.1

pglogical 3.7.13

This is a maintenance release for pglogical 3.7 which includes minor improvements as well as fixes for issues identified previously.

Improvements

  • Allow the BDR consensus process to use a (sub) replorigin (BDR-1613)
    For parallel apply, multiple writers could use invididual origins that would be merged, thus act kind of as child origins for a main or parent origin. Extend this mechanism to allow the BDR consensus process to also register an origin participating in the group.

Resolved Issues

  • Fix the replication protocol handling of some strings (RT74123)
    Certain strings like e.g. relation schema, relation name, attribute name, and message prefix were mistakenly changed to not include a terminating NULL character. This led to a protocol incompatibility when interoperating with PGLogical 3.5.3 and older or when WAL messages are used (e.g. due to the use of BDR features CAMO or Eager All Node Replication).

  • Fix segfault when creating a subscription with the RabbitMQ writer plugin (BDR-1482, RT72434)

  • Don't wait on own replication slot when waiting for standby_slot_names (RT74036)
    The walsenders that use slots named in standby_slot_names should not wait for anything, otherwise it might wait forever.

  • Reduce logging when WAL decoder is not available (BDR-1041)
    pgl_xact_fully_decoded() will log a message when it finds that the WAL decoder is not running. WAL decoder may not start for long duration and thus this log line will repeat and increase the log file size. There are other ways to know whether WAL decoder is running. So this line is not required.

  • Handle deprecated update_differing and update_recently_deleted (BDR-1610, RT74973)
    While upgrading to 3.7, take care of the deprecated conflict types otherwise, as they would be unknown to the new version, it would not know how to handle them and break the upgrade process.

  • Enable async conflict resolution for explicit 2PC (BDR-1609, RT71298)
    Continue applying the transaction using the async conflict resolution for explicit two phase commit.

Upgrades

This release supports upgrading from following versions of pglogical:

  • 3.7.9 and higher
  • 3.6.28
  • 2.4.0
  • 2.4.1

pglogical 3.7.12

This is a maintenance release for pglogical 3.7 which includes minor improvements as well as fixes for issues identified previously.

Improvements

  • Add replication status monitoring (BDR-865)
    Track the connection establishment and drops, the number of transactions committed as well as the restart LSN to detect repeated restarts and diagnose stalled replication.

  • Improve performance when used in combination with synchronous replication (BDR-1398)
    Override synchronous_commit to local' for all PGLogical processes performing only bookkeeping transactions which do not need to be replicated. Only the PGLogical writer processes applying user transactions need to follow the synchronous_commit mode configured for synchronous replication.

  • Internal improvements and additional hooks to better support BDR

Resolved Issues

  • Performance improvements for Decoding Worker (BDR-1311, BDR-1357) Speed up lookups of the WAL decoder worker, reduce delays after reaching up to the LSN proviously known to be decoded by the Decoding Worker, and reduce number of system calls when writing one LCR chunk to an LCR segment file.

  • Improve compatibility with Postgres 13 (BDR-1396)
    Adjust to an API change in ReplicationSlotAcquire that may have led to unintended blocking when non-blocking was requestend and vice versa. This version of PGLogical eliminates this potential problem, which has not been observed on production systems so far.

Upgrades

This release supports upgrading from following versions of pglogical:

  • 3.7.9 and higher
  • 3.6.27
  • 2.4.0

pglogical 3.7.11

This is a maintenance release for pglogical 3.7 which includes minor improvements as well as fixes for issues identified previously.

Resolved Issues

  • Add protection against malformed parameter values in pgl output plugin
    This fixes potential crash when some parameters sent to the output plugin were malformed.

  • Get copy of slot tuple when logging conflict (BDR-734)
    Otherwise we could materialize the row early causing wrong update in presence of additional columns on the downstream.

  • Use a separate memory context for processing LCRs (BDR-1237, RT72165)
    This fixes memory leak when using the decoding worker feature of BDR.

  • Truncate LCR segment file after recovery (BDR-1236, BDR-1259)
    This fixes memory errors reported by the decoding worker after crash.

  • Ensure pg_read_and_filter_lcr will exit when postmaster dies (BDR-1226, BDR-1209, RT72083)
    Solves issues with hanging decoder worker on shutdown.

  • Fix memory leak in the pglogical COPY handler (BDR-1219, RT72091)
    This fixes memory leak when synchronizing large tables.

  • Allow binary and internal protocol on more hardware combinations. This currently only affects internal testing.

Upgrades

This release supports upgrading from following versions of pglogical:

  • 3.7.9 and higher
  • 3.6.27
  • 2.4.0

pglogical 3.7.10

This is a maintenance release for pglogical 3.7 which includes minor improvements as well as fixes for issues identified previously.

Improvements

  • Windows support improvements (BDR-976, BDR-1083)

Resolved Issues

  • Fix potential crash during cleanup of bulk copy replication (BDR-1168)

  • Fix issues in generic WAL message handling when the WAL message was produced by something other than pglogical (BDR-670)

  • Redefine werr_age of pglogical.worker_error_summary to report correct age

  • Only use key attributes of covering unique index when used as replica identity This only affects what is being sent over network, no logic change.

pglogical 3.7.9

This is a maintenance release for pglogical 3.7 which includes minor improvements as well as fixes for issues identified previously.

Improvements

  • Support two-phase commit transaction with Decoding Worker (BDR-811)

    A two-phase commit transaction is completely decoded and sent downstream when processing its PREPARE WAL record. COMMIT/ROLLBACK PREPARED is replicated separately when processing the corresponding WAL record.

  • Reduce writes to pg_replication_origin when using parallel apply (RT71077)
    Previously, especially on EPAS pglogical could produce thousands of dead rows in pg_replication_origin system catalog if it had connection problems to upstream.

Resolved Issues

  • Fix flush queue truncation (BDR-890)
    During queue compaction flush to correct LSN instead of always truncating whole flush queue.

  • Fix the pglogical.worker_error columns werr_worker_index and werr_time
    Used to report wrong values.

  • Fix snapshot handling in internal executor usage (BDR-904)
    Row filtering didn't correctly pushed snapshot in some situations.

    Caught thanks to upcoming change in PostgreSQL that double checks for this.

  • Fix handling of domains over arrays and composite types (BDR-29)
    The replication protocol previously could not always handle columns using domains over arrays and domains over composite types.

pglogical 3.7.8

This is a first stable release of the pglogical 3.7.

pglogical 3.7 is a major release of pglogical. This release includes major new features as well as smaller enhancements.

Upgrades are supported from pglogical 3.6.25 and 3.7.7 in this release.

Improvements

  • Allow parallel apply on EDB Advanced Server (EE)

Resolved Issues

  • Fix divergence after physical failover (BDR-365, RT68894 and RM19886)
    Make sure that we don't report back LSN on subscriber that hasn't been received by named standbys (pglogical.standby_slot_names).

    This will ensure provider side keeps slot position behind far enough so that if subscriber is replaced by one of said named standbys, the standby will be able to fetch any missing replication stream from the original provider.

  • Fix crash when ERROR happened during fast shutdown of pglogical writer

  • Don't re-enter worker error handling loop recursively (BDR-667)
    This should help make what happens clearer in any cases where we do encounter errors during error processing.

  • Assign collation to the index scan key (BDR-561)
    When doing lookups for INSERT/UPDATE/DELETE, either to find conflicts or key for the operation to be applied, we should use correct collation.

    This fixes index lookups for index on textual fields on Postgres 12+.

  • Use name data type for known fixed length fields (BDR-561)
    This solves potential index collation issues with pglogical catalogs.

  • Progress WAL sender's slot based on WAL decoder input (BDR-567)
    Without slot progression the server will eventually stop working.

  • Fix begin of transaction write when LCR file does not have enough space (BDR-606)

  • Restart decoding a transaction that was not completed in single decoding worker (BDR-247)
    If we crashed during a long transaction that spawns on more then one lcr file we start decoding again from the lsn of the beginning of the transaction and find the last lcr file where we can write into.

  • Generate temp slot with correct length in subscription sync
    Otherwise the name of the slot might be shortened by Postgres leading to confusion on the slot cleanup.

  • Improve detection of mixing temporary and nontemporary objects in DDLs (BDR-485)
    These can break replication so it's important to not allow them.

  • Fix pre-commit message handling in Eager replication (BDR-492)

  • Override GUCs in all pglogical workers, not just in writers.

pglogical 3.7.7

This is a beta release of the pglogical 3.7.

pglogical 3.7 is a major release of pglogical. This release includes major new features as well as smaller enhancements.

Upgrades are supported from pglogical 3.6.25 and 3.7.6 in this release.

Improvements

  • Use parallel apply during initial sync during logical join

  • Add worker process index to the worker_error catalog (BDR-229)
    The column werr_worker_index in worker_error table keeps track of the writers for the same subscription.

  • Various improvements for WAL decoder/sender coordination (BDR-232, BDR-335)

  • Name LCR segment similar to XLOG segments (BDR-236, BDR-253, BDR-321, BDR-322)
    An LCR segment file is identified by five 8-digit hex numbers. Timeline first group, XLOG the next two groups and file number the last two groups.

Resolved Issues

  • Restrict adding queue table to the replication set. (EDBAS, EBD-45)

  • Fix deadlock between receiver and writer during queue flush (BDR-483)

  • Force and wait for all writers to exit when one writer dies (BDR-229)

  • Name LCR directory after the replication slot (BDR-60)
    Logical output plugin may be used by multiple replication slots. Store the LCRs from a given replication slot in a directory named after that replication slot to avoid mixing LCRs for different slots.

  • Fix EXPLAIN...INTO TABLE replication issue (EBC-46)

pglogical 3.7.6

This is a beta release of the pglogical 3.7.

pglogical 3.7 is a major release of pglogical. This release includes major new features as well as smaller enhancements.

Upgrades are supported from pglogical 3.6.25 in this release.

Improvements

  • Enable parallel apply for CAMO and Eager (RM17858)

  • Improve error handling in Eager/CAMO, especially with parallel apply (BDR-106)

  • Reduce severity of Eager/CAMO feedback errors

  • Add infrastructure necessary for allowing separation of WAL decoding from WalSender process in BDR (RM18868, BDR-51, BDR-60)

Resolved Issues

  • Improve relcache invalidation handling in heap writer
    This should solve missed invalidations after opening table for DML apply.

  • Wait for the writer that has XID assigned rather then one that doesn't (BDR-137)
    This fixes deadlock in parallel apply when there is a mix of empty and non-empty transactions where the non-empty ones conflict.

  • Correct writer state tracking for Eager/CAMO (BDR-107)

  • Correct and improve CAMO misconfiguration handling (BDR-105)
    Properly abort the transaction in case of sync CAMO, so the internal state of the PGL writer is cleared

  • Fix transaction state tracking for CAMO/Eager
    Do not abort the transaction at pre-commit time for non-CAMO nodes, but keep it open until the final commit. Adjust the transaction state tracking accordingly

  • Fix MERGE handling in 2ndQPostgres 12 and 13
    We'd before allow the MERGE command on replicated tables without appropriate REPLICA IDENTITY which could result in broken replication.

  • Fix CAMO feedback sending (RM17858)
    Fixes stalls in CAMO feedback, improving performance compared to previous 3.7 beta releases. This is especially visible with parallel apply enabled.

pglogical 3.7.5

This is a beta release of the pglogical 3.7.

pglogical 3.7 is a major release of pglogical. This release includes major new features as well as smaller enhancements.

Improvements

  • Optimize utility command processing (RT69617) For commands that won't affect any DB objects and don't affect pglogical we can skip the processing early without reading any pglogical or system catalogs or calling to DDL replication plugin interfaces. This is optimization for systems with large number of such utility command calls (that is primarily applications that do explicit transaction management).

  • Add upgrade path from pglogical 2.

Resolved Issues

  • Ensure that pglogical.standby_slot_names takes effect when pglogical.standby_slots_min_confirmed is at the default value of -1.

    On 3.6.21 and older pglogical.standby_slot_names was ignored if pglogical.standby_slot_names is set to zero (RM19042).

    Clusters satisfying the following conditions may experience inter-node data consistency issues after a provider failover:

    • Running pglogical 3.0.0 through to 3.6.21 inclusive;
    • Using pglogical subscriptions/or providers directly (BDR3-managed subscriptions between pairs of BDR3 nodes are unaffected);
    • Have a physical standby (streaming replica) of a pglogical provider intended as a failover candidate;
    • Have pglogical.standby_slot_names on the provider configured to list that physical standby;
    • Have left pglogical.standby_slots_min_confirmed unconfigured or set it explicitly to zero;

    This issue can cause inconsistencies between pglogical provider and subscriber and/or between multiple subscribers when a provider is replaced using physical replication based failover. It's possible for the subscriber(s) to receive transactions committed to the pre-promotion original provider that will not exist on the post-promotion replacement provider. This causes provider/subscriber divergence. If multiple subscribers are connected to the provider, each subscriber could also receive a different subset of transactions from the pre-promotion provider, leading to inter-subscriber divergence as well.

    The pglogical.standby_slots_min_confirmed now defaults to the newly permitted value -1, meaning "all slots listed in pglogical.standby_slot_names". The default of 0 on previous releases was intended to have that effect, but instead effectively disabled physical-before-logical replication.

    To work around the issue on older versions the operator is advised to set pglogical.standby_slots_min_confirmed = 100 in postgresql.conf. This has no effect unless pglogical.standby_slot_names is also set.

    No action is generally required for this issue on BDR3 clusters. BDR3 has its own separate protections to ensure consistency during promotion of replicas.

  • Fix pglogical_create_subscriber when "-v" is passed. It will make pg_ctl emit meaningful information, making it easier to debug issues where pg_ctl fails

pglogical 3.7.4

This is a beta release of the pglogical 3.7.

pglogical 3.7 is a major release of pglogical. This release includes major new features as well as smaller enhancements.

Improvements

  • Support PostgreSQL 13

  • Add pglogical.replication_origin_status view (RM17074)
    Same as pg_replication_origin_status but does not require superuser permissions to access it.

  • Beta support of upgrades from 3.6 (currently from 3.6.22)

  • Improved SystemTAP support

Resolved Issues

  • Fix race condition in replication table filtering which could cause crash (RM18839)
    The cached info about table might get invalidated while used which would crash the backend during one of the following operations:

    • reading the pglogical.tables view
    • new subcription creation
    • table resynchronization
  • Don't do transparent DDL replication on commands that affect temporary objects (RM19491, RT69170)
    These are likely to not exist on the subscription.

  • Only run pgl specific code in deadlock detector when inside writer (RM18402)
    It's not relevant in user backends and would cause ERRORs there.

pglogical 3.7.3

This is a beta release of the pglogical 3.7.

pglogical 3.7 is a major release of pglogical. This release includes major new features as well as smaller enhancements.

Improvements

  • Parallel Apply (RM6503) Allows configuring number of writers per subscriptions. By default this is still 1, which mean parallel apply is off by default but can be enabled both globally (pglogical.writers_per_subscription) and per subscription (num_writers option in pglogical.create_subscription() and pglogical.alter_subscription_num_writers()).

  • Split "replicating" subscription status into two statuses
    One is still "replicating" and is reported only if something was actually replicated by the subcription since last worker start. The other is "started" and just means that the worker for the subscription is running at the time the status query was executed.
    This should reduce confusion where subscription would report "replicating" status but worker is in restart loop due to an apply error.

  • Substantial test and testing framework improvements

  • Improve 2ndQPostgres and BDR version dependency handling (RM17024)

  • Add PostgreSQL 12 support to pglogical_create_subscriber

  • Rework table resynchronization to use separate receiver process from the main one
    This improves performance of both main apply (primarily latency) and the resynchronization itself.
    It also fixes potential issue where table could be considered synchronized before the catchup finished completely.

Resolved Issues

  • Fix crash on resynchronization of large partitioned tables (RM18154, RM15733, RT68455, RT68352)
    The resync process would keep crashing due to cache invalidation race condition if the COPY took very long or if there was DDL activity on the copied table during the COPY.

  • Prohibit MERGE and UPSERT on a table without replica identity (RM17323, RT68146) These commands can end up doing UPDATE which will break replication if the table has no replica identity as the downstream has no way to find the matching row for updating.

  • Resolve relcache reference leak reports (RM16956) Close the relation correctly in pglogical.show_repset_tables_info()

  • Resolve rare crash in HeapWriter row cleanup code (RM16956)

  • Resolve rare crash on worker exit (RM11686)
    If a pglogical worker exited before it finished initialization it could crash instead of exiting cleanly.

  • Fix apply errors parallel index rebuild afrer TRUNCATE (RM17602)

pglogical 3.7.2

This is a beta release of the pglogical 3.7.

pglogical 3.7 is a major release of pglogical. This release includes major new features as well as smaller enhancements.

pglogical 3.7 introduces several major new features as well as architectural changes some of which affect backward compatibility with existing applications.

Important Notes

  • Beta software is not supported in production - for application test only

  • Upgrade from 3.6 is not supported in this release, yet.

Improvements

  • Add support for Postgres 12, deprecate support for older versions
    pglogical 3.7 now requires at least Postgres 10 and supports up to Postgres 12.

Resolved Issues

  • Keep open the connection until pglogical_create_subscriber finishes (RM13649)
    Set idle_in_transaction_session_timeout to 0 so we avoid any user setting that could close the connection and invalidate the snapshot.