• HashiCorp Cloud Platform
    • Terraform
    • Packer
    • Consul
    • Vault
    • Boundary
    • Nomad
    • Waypoint
    • Vagrant
  • Sign up
Reliability Labs

Well-Architected Framework

Skip to main content
  • Provide Fault Tolerance with Redundancy Zones

  • Resources

  • Tutorial Library
  • Community Forum
    (opens in new tab)
  • Support
    (opens in new tab)
  • GitHub
    (opens in new tab)
  1. Developer
  2. Well-Architected Framework
  3. Reliability Labs
  4. Provide Fault Tolerance with Redundancy Zones

Provide Fault Tolerance with Redundancy Zones

  • 12min

  • EnterpriseEnterprise
  • ConsulConsul

Enterprise Only: The redundancy zone functionality demonstrated here requires HashiCorp Cloud Platform (HCP) or self-managed Consul Enterprise. If you've purchased or wish to try out Consul Enterprise, refer to how to access Consul Enterprise.

In this tutorial, you will configure fault resiliency for your Consul datacenter using redundancy zones.

Redundancy zones is a Consul autopilot feature that makes it possible to run one voter and any number of non-voters in each defined zone. Please note that there can only be one voter per zone.

For this tutorial, you will use one voter and one non-voter in three regions, for a total of six servers. If an entire availability zone is completely lost, both the voter and non-voter will be lost, however the datacenter will remain available. If only the voter is lost in an availability zone, autopilot will promote the non-voter to voter automatically, putting the hot standby server into service quickly.

You can use this tutorial to implement isolated failure domains such as AWS Availability Zones (AZ) to obtain redundancy within an AZ without having to sustain the overhead of a large quorum.

Prerequisites

To setup a Consul datacenter with three availability zones, one per voter, you will need:

  • A Consul Enterprise datacenter with three servers.
  • Three extra nodes with Consul Enterprise binary installed to be used as non-voters.

You will also need a text editor, the curl executable to test the API endpoints, and optionally the jq command to format the output for curl.

Configure servers for redundancy zones

The first step to use the availability zones functionality is to re-configure your existing datacenter to divide the servers into three different zones.

server-1 agent configuration
server-1 agent configuration
## ...
node_meta {
  zone = "zone1"
}
## ...
## ...
  "node_meta": {
    "zone": "zone1"
  },
## ...
$ consul reload
Configuration reload triggered

You can verify the configuration is in place using the /agent/self API endpoint.

$ curl localhost:8500/v1/agent/self | jq
{
  "Config": {
    "Datacenter": "dc1",
    "NodeName": "server-1",
    "NodeID": "738f19c4-8543-eef2-6f83-e20544b863dd",
    "Revision": "4c18cd19a",
    "Server": true,
    "Version": "1.7.2+ent"
},
#...
  "Meta": {
    "consul-network-segment": "",
    "zone": "zone1"
  }
}

The node_meta configuration needs to be applied to all servers. Check other tabs to find configuration and instructions for the other servers in the datacenter.

server-2 agent configuration
server-2 agent configuration
## ...
node_meta {
  zone = "zone2"
}
## ...
## ...
  "node_meta": {
    "zone": "zone2"
  },
## ...
$ consul reload
Configuration reload triggered

The node_meta configuration needs to be applied to all servers. Check other tabs too find configuration and instructions for the other servers in the datacenter.

server-3 agent configuration
server-3 agent configuration
## ...
node_meta {
  zone = "zone3"
}
## ...
## ...
  "node_meta": {
    "zone": "zone3"
  },
## ...
$ consul reload
Configuration reload triggered

The node_meta configuration needs to be applied to all servers. Check other tabs too find configuration and instructions for the other servers in the datacenter.

Once the configuration is updated on all the servers, you can update Consul autopilot configuration to reflect the node_meta configuration.

$ consul operator autopilot set-config -redundancy-zone-tag=zone
Configuration updated!

The command can be executed from any of the Consul nodes in your datacenter and will be automatically propagated across the datacenter.

Verify server configuration

You can verify the configuration is now in place using the get-config option for the operator command.

$ consul operator autopilot get-config
CleanupDeadServers = true
LastContactThreshold = 200ms
MaxTrailingLogs = 250
MinQuorum = 0
ServerStabilizationTime = 10s
RedundancyZoneTag = "zone"
DisableUpgradeMigration = false
UpgradeVersionTag = ""

The configuration can alternatively be applied in the agent configuration file with the redundancy_zone_tag option. To update the autopilot configuration, the parameter needs to be in place when the datacenter is bootstrapped for the first time. However, since the configuration change requires an agent restart, we recommend either setting the option at bootstrap time or with the CLI. The CLI will not cause downtime.

Add new servers to your datacenter

Once the existing datacenter is configured for redundancy zones, you can add the new servers.

This time you will include the autopilot configuration in the configuration files since you are starting the servers for the first time.

server-1a agent configuration
server-1a agent configuration
## ...
node_meta {
  zone = "zone1"
},
autopilot {
  redundancy_zone_tag = "zone"
}
## ...
## ...
  "node_meta": {
    "zone": "zone1"
  },
  "autopilot":{
    "redundancy_zone_tag":"zone"
  },
## ...
server-2a agent configuration
server-2a agent configuration
## ...
node_meta {
  zone = "zone2"
},
autopilot {
  redundancy_zone_tag = "zone"
}
## ...
## ...
  "node_meta": {
    "zone": "zone2"
  },
  "autopilot":{
    "redundancy_zone_tag":"zone"
  },
## ...
server-3a agent configuration
server-3a agent configuration
## ...
node_meta {
  zone = "zone3"
},
autopilot {
  redundancy_zone_tag = "zone"
}
## ...
## ...
  "node_meta": {
    "zone": "zone3"
  },
  "autopilot":{
    "redundancy_zone_tag":"zone"
  },
## ...

After starting all the new servers, you can check the configuration with the operator command.

$ consul operator raft list-peers
Node           ID                                    Address           State     Voter  RaftProtocol
server-1       738f19c4-8543-eef2-6f83-e20544b863dd  10.20.10.11:8300  leader    true   3
server-2       feffe44d-b0c6-809f-97b6-cb7143b5cb9d  10.20.10.12:8300  follower  true   3
server-3       6154e025-55ad-89a5-298d-6b7ae6cfb0f8  10.20.10.13:8300  follower  true   3
server-1a      43afac13-f5af-b06f-8a9b-0092244790df  10.20.10.21:8300  follower  false  3
server-2a      a5842855-58a6-197d-694d-b56eab9acc5c  10.20.10.22:8300  follower  false  3
server-3a      96f1d875-d57e-558e-2707-a32a63666bfb  10.20.10.23:8300  follower  false  3

All the new servers, once started, are added to the datacenter as non-voters. You can reference the Voter column in the output to verify it.

Test fault tolerance

To verify your configuration, stop one of the voters, in this case server-2, and verify that the correspondent non-voter in its redundancy zone gets promoted as a voter as soon as the server gets declared unhealthy.

$ consul operator raft list-peers
Node           ID                                    Address           State     Voter  RaftProtocol
server-1       738f19c4-8543-eef2-6f83-e20544b863dd  10.20.10.11:8300  leader    true   3
server-3       6154e025-55ad-89a5-298d-6b7ae6cfb0f8  10.20.10.13:8300  follower  true   3
server-1a      43afac13-f5af-b06f-8a9b-0092244790df  10.20.10.21:8300  follower  false  3
server-2a      a5842855-58a6-197d-694d-b56eab9acc5c  10.20.10.22:8300  follower  true   3
server-3a      96f1d875-d57e-558e-2707-a32a63666bfb  10.20.10.23:8300  follower  false  3

Once server-2a gets promoted as a voter you can start Consul on server-2 again and verify the one voter per redundancy zone rule is still respected.

$ consul operator raft list-peers
Node           ID                                    Address           State     Voter  RaftProtocol
server-1       738f19c4-8543-eef2-6f83-e20544b863dd  10.20.10.11:8300  leader    true   3
server-3       6154e025-55ad-89a5-298d-6b7ae6cfb0f8  10.20.10.13:8300  follower  true   3
server-1a      43afac13-f5af-b06f-8a9b-0092244790df  10.20.10.21:8300  follower  false  3
server-2a      a5842855-58a6-197d-694d-b56eab9acc5c  10.20.10.22:8300  follower  true   3
server-3a      96f1d875-d57e-558e-2707-a32a63666bfb  10.20.10.23:8300  follower  false  3
server-2       feffe44d-b0c6-809f-97b6-cb7143b5cb9d  10.20.10.12:8300  follower  false  3

Next steps

In this tutorial you learned how to configure Consul autopilot's redundancy zones to add a pool of read replica servers to your datacenter and use them as hot standby in case one of the voters fails.

Consul Enterprise offers you one more option to configure non-voter servers in a datacenter using the enhanced read scalability functionality where you can define a set of servers to be only used to ease the read load from the voting servers. Note that in case you configure a server using non_voting_server set to true, the server will never be promoted to a voter even if a voter server fails.

You can learn more on other autopilot functionalities by checking our autopilot tutorial.

 Back to Collection
 Next Collection

This tutorial also appears in:

  •  
    30 tutorials
    Associate Tutorial List
    Study for the Consul Associate exam by following these tutorials. Login to Learn and bookmark them to track your progress. Study the complete list of study materials (including docs) in the Certification Prep guides.
    • Consul
  •  
    10 tutorials
    Consul Enterprise
    Consul Enterprise eases the operational complexities with redundancy, read scalability, managed access, and service architectures across complex network topologies.
    • Consul
  •  
    14 tutorials
    Day 2: Datacenter Operations
    Get the most out of your Consul datacenter with advanced operations such as advanced networking, ops automation, security, outage recovery, and troubleshooting.
    • Consul

On this page

  1. Provide Fault Tolerance with Redundancy Zones
  2. Prerequisites
  3. Configure servers for redundancy zones
  4. Add new servers to your datacenter
  5. Test fault tolerance
  6. Next steps
Give Feedback(opens in new tab)
  • Certifications
  • System Status
  • Terms of Use
  • Security
  • Privacy
  • Trademark Policy
  • Trade Controls
  • Give Feedback(opens in new tab)