Timestamp-Based Snapshots v3.7
The Timestamp-Based Snapshots feature of 2ndQPostgres allows reading data in a consistent manner via a user-specified timestamp rather than the usual MVCC snapshot. This can be used to access data on different BDR nodes at a common point-in-time; for example, as a way to compare data on multiple nodes for data quality checking. At this time, this feature does not work with write transactions.
Note
This feature is currently only available on EDB Postgres Extended.
The use of timestamp-based snapshots are enabled via the snapshot_timestamp
parameter; this accepts either a timestamp value or
a special value, 'current', which represents the current timestamp (now). If
snapshot_timestamp
is set, queries will use that timestamp to determine
visibility of rows, rather than the usual MVCC semantics.
For example, the following query will return state of the customers
table at
2018-12-08 02:28:30 GMT:
In plain 2ndQPostgres, this only works with future timestamps or the above mentioned special 'current' value, so it cannot be used for historical queries (though that is on the longer-term roadmap).
BDR works with and improves on that feature in a multi-node environment. Firstly,
BDR will make sure that all connections to other nodes replicated any
outstanding data that were added to the database before the specified
timestamp, so that the timestamp-based snapshot is consistent across the whole
multi-master group. Secondly, BDR adds an additional parameter called
bdr.timestamp_snapshot_keep
. This specifies a window of time during which
queries can be executed against the recent history on that node.
You can specify any interval, but be aware that VACUUM (including autovacuum) will not clean dead rows that are newer than up to twice the specified interval. This also means that transaction ids will not be freed for the same amount of time. As a result, using this can leave more bloat in user tables. Initially, we recommend 10 seconds as a typical setting, though you may wish to change that as needed.
Note that once the query has been accepted for execution, the query may run
for longer than bdr.timestamp_snapshot_keep
without problem, just as normal.
Also please note that info about how far the snapshots were kept does not survive server restart, so the oldest usable timestamp for the timestamp-based snapshot is the time of last restart of the PostgreSQL instance.
One can combine the use of bdr.timestamp_snapshot_keep
with the
postgres_fdw
extension to get a consistent read across multiple nodes in a
BDR group. This can be used to run parallel queries across nodes, when used in
conjunction with foreign tables.
There are no limits on the number of nodes in a multi-node query when using this feature.
Use of timestamp-based snapshots does not increase inter-node traffic or bandwidth. Only the timestamp value is passed in addition to query data.