Consensus layer considerations v3.7
HARP is designed so that it can work with different implementations of consensus layer, also known as Distributed Control Systems (DCS).
Currently the following DCS implementations are supported:
- etcd
- BDR
This information is specific to HARP's interaction with the supported DCS implementations.
BDR driver compatibility
The bdr
native consensus layer is available from BDR versions
3.6.21
and 3.7.3.
For the purpose of maintaining a voting quorum, BDR Logical Standby nodes don't participate in consensus communications in a EDB Postgres Distributed cluster. Don't count these in the total node list to fulfill DCS quorum requirements.
Maintaining quorum
Clusters of any architecture require at least n/2 + 1 nodes to maintain consensus via a voting quorum. Thus a three-node cluster can tolerate the outage of a single node, a five-node cluster can tolerate a two-node outage, and so on. If consensus is ever lost, HARP becomes inoperable because the DCS prevents it from deterministically identifying the node that is the lead master in a particular location.
As a result, whichever DCS is chosen, more than half of the nodes must always be available cluster-wide. This can become a non-trivial element when distributing DCS nodes among two or more data centers. A network partition prevents quorum in any location that can't maintain a voting majority, and thus HARP stops working.
Thus an odd-number of nodes (with a minimum of three) is crucial when building the consensus layer. An ideal case distributes nodes across a minimum of three independent locations to prevent a single network partition from disrupting consensus.
One example configuration is to designate two DCS nodes in two data centers coinciding with the primary BDR nodes, and a fifth DCS node (such as a BDR witness) elsewhere. Using such a design, a network partition between the two BDR data centers doesn't disrupt consensus thanks to the independently located node.
Multi-consensus variant
HARP assumes one lead master per configured location. Normally each
location is specified in HARP using the location
configuration setting.
By creating a separate DCS cluster per location, you can emulate
this behavior independently of HARP.
To accomplish this, configure HARP in config.yml
to use a different
DCS connection target per desired Location.
HARP nodes in DC-A use something like this:
While DC-B uses different hostnames corresponding to nodes in its canonical location:
There's no DCS communication between different data centers in this design, and thus a network partition between them doesn't affect HARP operation. A consequence of this is that HARP is completely unaware of nodes in the other location, and each location operates essentially as a separate HARP cluster.
This isn't possible when using BDR as the DCS, as BDR maintains a consensus layer across all participant nodes.
A possible drawback to this approach is that harpctl
can't interact
with nodes outside of the current location. It's impossible to obtain
node information, get or set the lead master, or perform any other operation that
targets the other location. Essentially this organization renders the
--location
parameter to harpctl
unusable.
TPAexec and consensus
These considerations are integrated into TPAexec as well. When deploying a cluster using etcd, it constructs a separate DCS cluster per location to facilitate high availability in favor of strict consistency.
Thus this configuration example groups any DCS nodes assigned to the first
location together, and the
second
location is a separate cluster:
To override this behavior,
configure the harp_location
implicitly to force a particular grouping.
Thus this example returns all etcd nodes into a single cohesive DCS layer:
The harp_location
override might also be necessary to favor specific node
groupings when using cloud providers such as Amazon that favor availability
zones in regions over traditional data centers.
- On this page
- BDR driver compatibility
- Maintaining quorum