Provide Fault Tolerance with Redundancy Zones
Enterprise Only
The redundancy zone functionality demonstrated here requires HashiCorp Cloud Platform (HCP) or self-managed Consul Enterprise. If you've purchased or wish to try out Consul Enterprise, refer to how to access Consul Enterprise.
In this tutorial, you will configure fault resiliency for your Consul datacenter using redundancy zones.
Redundancy zones is a Consul autopilot feature that makes it possible to run one voter and any number of non-voters in each defined zone. Please note that there can only be one voter per zone.
For this tutorial, you will use one voter and one non-voter in three regions, for a total of six servers. If an entire availability zone is completely lost, both the voter and non-voter will be lost, however the datacenter will remain available. If only the voter is lost in an availability zone, autopilot will promote the non-voter to voter automatically, putting the hot standby server into service quickly.
You can use this tutorial to implement isolated failure domains such as AWS Availability Zones (AZ) to obtain redundancy within an AZ without having to sustain the overhead of a large quorum.
Prerequisites
To setup a Consul datacenter with three availability zones, one per voter, you will need:
- A Consul Enterprise datacenter with three servers.
- Three extra nodes with Consul Enterprise binary installed to be used as non-voters.
You will also need a text
editor, the curl
executable to test the API endpoints, and optionally the jq
command to format the output for curl
.
Configure servers for redundancy zones
The first step to use the availability zones functionality is to re-configure your existing datacenter to divide the servers into three different zones.
You can verify the configuration is in place using the /agent/self
API
endpoint.
The node_meta
configuration needs to be applied to all servers. Check other tabs to find configuration and instructions for the other servers in the datacenter.
Once the configuration is updated on all the servers, you can
update Consul autopilot configuration to reflect the node_meta
configuration.
The command can be executed from any of the Consul nodes in your datacenter and will be automatically propagated across the datacenter.
Verify server configuration
You can verify the configuration is now in place using the get-config
option
for the operator
command.
The configuration can alternatively be applied in the agent configuration file with the
redundancy_zone_tag
option. To update the autopilot configuration,
the parameter needs to be in place when the datacenter is bootstrapped for the
first time. However, since the configuration change requires an agent restart, we recommend either setting the option at bootstrap time or with the CLI. The CLI will not cause downtime.
Add new servers to your datacenter
Once the existing datacenter is configured for redundancy zones, you can add the new servers.
This time you will include the autopilot configuration in the configuration files since you are starting the servers for the first time.
After starting all the new servers, you can check the configuration with
the operator
command.
All the new servers, once started, are added to the datacenter as non-voters. You
can reference the Voter
column in the output to verify it.
Test fault tolerance
To verify your configuration, stop one of the
voters, in this case server-2
, and verify that the correspondent non-voter in
its redundancy zone gets promoted as a voter as soon as the server gets declared
unhealthy.
Once server-2a
gets promoted as a voter you can start Consul on server-2
again and verify the one voter per redundancy zone rule is still respected.
Next steps
In this tutorial you learned how to configure Consul autopilot's redundancy zones to add a pool of read replica servers to your datacenter and use them as hot standby in case one of the voters fails.
Consul Enterprise offers you one more option to configure non-voter servers in a
datacenter using the enhanced read
scalability
functionality where you can define a set of servers to be only used to ease the
read load from the voting servers. Note that in case you configure a
server using non_voting_server
set to true, the server will never be
promoted to a voter even if a voter server fails.
You can learn more on other autopilot functionalities by checking our autopilot tutorial.