Creating a Failover Manager cluster v4

Suggest edits

Failover Manager is a high-availability tool that allows a Postgres primary node to automatically failover to a standby node in the case of a software or hardware failure on the primary node.

This tutorial describes configuring a Failover Manager cluster in a test environment. Before configuring Failover Manager for a production deployment, read and understand the rest of the Failover Manager documentation.

Using EDB Postgres Advanced Server as an example (Failover Manager also works with PostgreSQL), follow these steps for basic installation and configuration before beginning the tutorial:

Install and initialize a database server on one primary and one or two standby nodes. For information about installing, refer to the EDB Postgres Advanced Server documentation.
Postgres streaming replication must be configured and running between the primary and standby nodes. For detailed information about configuring streaming replication, refer to Configuring streaming replication.
Install Failover Manager on each primary and standby node. During EDB Postgres Advanced Server installation, you configured an EDB repository on each database host. You can use the EDB repository and the yum install command to install Failover Manager on each node of the cluster:
```
yum install edb-efm46
```

During the installation process, the installer creates a user named efm that has privileges to invoke scripts that control the Failover Manager service for clusters owned by enterprisedb or postgres. The example that follows creates a cluster named efm.

Start the configuration process on a primary or standby node. Then, copy the configuration files to other nodes to save time.

Create working configuration files. Copy the provided sample files to create Failover Manager configuration files, and correct the ownership and version number if you are installing a different version:
```
cd /etc/edb/efm-4.6

cp efm.properties.in efm.properties

cp efm.nodes.in efm.nodes

chown efm:efm efm.properties

chown efm:efm efm.nodes
```
Create an encrypted password needed for the properties file:
```
/usr/edb/efm-4.6/bin/efm encrypt efm
```
Follow the onscreen instructions to produce the encrypted version of your database password.

Update efm.properties. The <cluster_name>.properties file (efm.properties in this example) contains parameters that specify connection properties and behaviors for your Failover Manager cluster. Modifications to property settings are applied when Failover Manager starts.

The properties mentioned in this tutorial are the minimal properties required to configure a Failover Manager cluster. If you're configuring a production system, review Configuring Failover Manager for detailed information about Failover Manager options.

Provide values for the following properties on all cluster nodes:

Property	Description
`db.user`	The name of the database user.
`db.password.encrypted`	The encrypted password of the database user.
`db.port`	The port monitored by the database.
`db.database`	The name of the database.
`db.service.owner`	The owner of the `data` directory (usually `postgres` or `enterprisedb`). Required only if the database is running as a service.
`db.service.name`	The name of the database service (used to restart the server). Required only if the database is running as a service.
`db.bin`	The path to the `bin` directory (used for calls to `pg_ctl`).
`db.data.dir`	The `data` directory in which EFM will find or create the `recovery.conf` file or the `standby.signal` file.
`user.email`	An email address at which to receive email notifications (notification text is also in the agent log file).
`bind.address`	The local address of the node and the port to use for Failover Manager. The format is: `bind.address=1.2.3.4:7800`
`is.witness`	`true` on a witness node and `false` if it is a primary or standby.
`ping.server.ip`	If you are running on a network without Internet access, set `ping.server.ip` to an address that is available on your network.
`auto.allow.hosts`	On a test cluster, set to `true` to simplify startup; for production usage, consult the Failover Manager User Guide.
`stable.nodes.file`	On a test cluster, set to `true` to simplify startup; for production usage, consult the Failover Manager User Guide.

Update efm.nodes. The <cluster_name>.nodes file (efm.nodes in this example) is read at startup to tell an agent how to find the rest of the cluster or, in the case of the first node started, can be used to simplify authorization of subsequent nodes. Add the addresses and ports of each node in the cluster to this file. One node acts as the membership coordinator. Include in the list at least the membership coordinator's address. For example:
1.2.3.4:7800
1.2.3.5:7800
1.2.3.6:7800
The Failover Manager agent doesn't validate the addresses in the efm.nodes file. The agent expects that some of the addresses in the file can't be reached (for example, that another agent hasn’t been started yet).
Configure the other nodes. Copy the efm.properties and efm.nodes files to /etc/edb/efm-4.6 on the other nodes in your sample cluster. After copying the files, change the file ownership so the files are owned by efm:efm. The efm.properties file can be the same on every node, except for the following properties:
- Modify the bind.address property to use the node’s local address.
- Set is.witness to true if the node is a witness node. If the node is a witness node, the properties relating to a local database installation are ignored.
Start the Failover Manager cluster. On any node, start the Failover Manager agent. The agent is named edb-efm-4.6; you can use your platform-specific service command to control the service. For example, on a RHEL 7.x or Rocky Linux/AlmaLinux/RHEL 8.x host, use the command:
```
systemctl start edb-efm-4.6
```
After the agent starts, run the following command to see the status of the single-node cluster. The addresses of the other nodes appear in the Allowed node host list.
```
/usr/edb/efm-4.6/bin/efm cluster-status efm
```
Start the agent on the other nodes. Run the efm cluster-status efm command on any node to see the cluster status.
If any agent fails to start, see the startup log for information about what went wrong:
```
cat /var/log/efm-4.6/startup-efm.log
```

Perform a switchover

If the cluster status output shows that the primary and standby nodes are in sync, you can perform a switchover:

/usr/edb/efm-4.6/bin/efm promote efm -switchover

The command promotes a standby and reconfigures the primary database as a new standby in the cluster. To switch back, run the command again.

Access online help

For quick access to online help, use:

/usr/edb/efm-4.6/bin/efm --help