Consul
Upgrade WAN-federated Consul datacenters
This page describes the process to upgrade multiple Consul datacenters that are joined with each other by WAN federation.
Overview
A fundamental requirement for many production environment is zero downtime when you upgrade to a new version.
Consul's support for upgrading clusters with zero downtime also extends to multi-datacenter operations in a federated Consul environment.
To upgrade your Consul datacenter's version:
- Perform a rolling upgrade and restart of the Consul agents on the servers nodes in the primary datacenter.
- Perform a rolling upgrade and restart of all the client agents in the primary datacenter.
- Perform a rolling upgrade and restart on the Consul agents on the server nodes in the secondary datacenter.
- Perform a rolling upgrade and restart of all the client agents in the secondary datacenter.
Repeat this process for each secondary datacenter in the WAN-federated environment. This approach allows you to maintain every datacenter's availability in the WAN-federated network throughout the entire upgrade process.
Prerequisites
The following instructions require two Consul datacenters joined by WAN federation, with access control list (ACL) replication enabled.
If you have not met these deployment requirements, refer to the follow documentation to set up your datacenters:
- Deployment Guide
- Securing Consul with ACLs
- Basic Federation with WAN Gossip
- ACL Replication for Multiple Datacenters
If you are using Consul's service mesh in your WAN-Federated environment, you should also set enable_central_service_config = true on your Consul clients, which allows you to centrally configure the sidecar and mesh gateway proxies.
To verify that your datacenters are successfully federated, use the consul members -wan command:
$ consul members -wan
Node Address Status Type Build Protocol DC Partition Segment
consul-server-0.primary 172.20.0.10:8302 alive server 1.19.2 2 primary default <all>
consul-server-1.primary 172.20.0.9:8302 alive server 1.19.2 2 primary default <all>
consul-server-2.primary 172.20.0.14:8302 alive server 1.19.2 2 primary default <all>
consul-server-0.secondary 172.20.0.5:8302 alive server 1.19.2 2 secondary default <all>
consul-server-1.secondary 172.20.0.4:8302 alive server 1.19.2 2 secondary default <all>
consul-server-2.secondary 172.20.0.8:8302 alive server 1.19.2 2 secondary default <all>
Prepare for upgrade
Before you upgrade your datacenters, take the following steps to prepare for the process:
- Check for additional Consul version compatibility requirements
- Take a snapshot to backup your primary datacenter
- Increase agent log verbosity to debug errors
Check Consul version compatibility
For all Consul upgrades, we suggest performing, at most, two major versions jumps. For example, if 1.21.x is the current release, do not use versions older than 1.19.x. This helps avoiding risks during upgrades. Consul Enterprise users can take longer jumps between some versions with Consul Long-Term Support.
To verify if your upgrade path is safe check:
Check Envoy version compatibility
If you are using Consul's service mesh, when you upgrade Consul on the client nodes you must also make sure that the Envoy version you have on the node is compatible with the new version of Consul.
You can find the Envoy versions that your new Consul version supports on the Envoy proxy configuration reference. documentation page.
Tip
If you are using Consul 1.8.4 or later you can also use the /v1/agent/self API endpoint to check the compatible Envoy version for your running agent.
In this scenario we are upgrading from Consul 1.12.2, which supports Envoy up to version 1.33.x to Consul 1.21.4, which supports Envoy from version 1.31.x.
That means the two versions share a compatible Envoy version, so if you are running Envoy 1.33.x with your agents you might not need to upgrade Envoy right away.
We recommend that you upgrade Envoy to the latest supported version so that you can leverage new Envoy functionalities and improvements in your network.
Refer to Envoy documentation to learn how to install Envoy on your system.
Take a snapshot
Even though this procedure should not result in any data loss in Consul state, we still recommend creating a backup by taking a snapshot of your environment every time you perform a potentially disruptive operation such as an upgrade.
To learn how to perform a snapshot on an existing Consul datacenter, refer to Backup a Consul datacenter.
Increase log verbosity on agents
To debug eventual issues with the upgrade process, temporarily modify your Consul server agent's log_level configuration to debug. This setting provides you more information to work with in the event that something goes wrong.
consul.hcl
## ...
log_level = "debug"
## ...
Use the following command on your servers to reload the configuration.
$ consul reload
Configuration reload triggered
Be sure to apply this change on all Consul nodes you are upgrading.
Install new Consul version on agents
Install the later version of Consul on every node where a Consul agent runs.
Server rolling upgrade
In order to minimize unavailability, perform the upgrade process one server at a time. Start with the followers and upgrade the Raft leader last.
To identify the Raft leader in your datacenter, use the consul operator command.
$ consul operator raft list-peers
Node ID Address State Voter RaftProtocol Commit Index Trails Leader By
consul-server-1 7eac8bf6-20dc-fd79-4a8e-a478d763d71a 172.20.0.4:8300 leader true 3 214 -
consul-server-0 5b4cd28e-708c-0cea-4d5a-cbe534fce408 172.20.0.5:8300 follower true 3 214 0 commits
consul-server-2 b87109a8-dd55-76ce-8cf4-123953daa62a 172.20.0.8:8300 follower true 3 214 0 commits
In the example output above the leader is consul-server-1. Proceed with upgrading consul-server-0 and consul-server-2 first, and then upgrade consul-server-1 last.
Note
The upgrade on the servers is not complete until after you complete the upgrade steps on ALL servers.
Leave datacenter
Login to the server agent and issue the consul leave command.
$ consul leave
Graceful leave complete
Wait until the command returns.
After the server leaves the datacenter you should observe the following output in your datacenter.
$ consul members
Node Address Status Type Build Protocol DC Partition Segment
consul-server-0 172.20.0.10:8301 left server 1.19.2 2 primary default <all>
consul-server-1 172.20.0.9:8301 alive server 1.19.2 2 primary default <all>
consul-server-2 172.20.0.14:8301 alive server 1.19.2 2 primary default <all>
Confirm from the output the server's status now appears as left.
Start new Consul version
Before starting Consul, check the version for the binary in your path.
$ consul version
Consul v1.21.4
Revision 59b8b905
Build Date 2025-08-13T12:03:12Z
Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)
Start Consul using the new binary version. Confirm the node with the updated version joined the datacenter again.
$ consul members
Node Address Status Type Build Protocol DC Partition Segment
consul-server-0 172.20.0.10:8301 alive server 1.21.4 2 primary default <all>
consul-server-1 172.20.0.9:8301 alive server 1.19.2 2 primary default <all>
consul-server-2 172.20.0.14:8301 alive server 1.19.2 2 primary default <all>
Confirm that all the three servers are marked as alive, and now consul-server-0 shows 1.21.4 as its version.
Update ACL token for agent
If your ACL configuration did not include enable_token_persistence = true and you did not set the server tokens in the configuration files, you must add the agent and default tokens to the agent again before it could join the datacenter.
Refer to Apply individual tokens to agents for detailed steps.
Leader restart
It is important to perform the upgrade one server at a time to ensure there are enough alive servers to maintain quorum.
Once enough servers with the new version join the raft cluster, it is possible that a new leader election will be triggered without the leading server leaving the datacenter or being restarted. If the process is followed, Consul will gracefully react to the leave and restart of the server agents and will remain available during the whole upgrade process.
In this scenario, each datacenter runs a total of 3 servers. That mains you can maintain quorum as long as 2 servers are still part of the Raft cluster.
Client rolling upgrade
After the servers are correctly restarted and they re-join the datacenter, perform the same steps on every client agent to finish upgrading Consul's version in the datacenter.
- Install Consul's new version on the client
- Stop the client agent using
consul leave - Start Consul using the new binary
- Verify the client is correcly shown in
consul members
Service downtime considerations
When you upgrade the client agents, the service provided by the agent will be marked as unhealthy and will not be discoverable from the time you issue the consul leave command until the Consul client agent has restarted.
If you run multiple instance of the same service, each with its own Consul client agent, then the service will still be available in your datacenter while other nodes upgrade.
Update Secondary Datacenter
After your upgrade for the primary datacenter is complete, follow the same steps to perform the upgrade on each secondary datacenter.
Verify federation
After the upgrade process is complete in all datacenters, you can confirm federation is still working using the consul members command.
$ consul members -wan
Node Address Status Type Build Protocol DC Partition Segment
consul-server-0.primary 172.20.0.10:8302 alive server 1.21.4 2 primary default <all>
consul-server-1.primary 172.20.0.9:8302 alive server 1.21.4 2 primary default <all>
consul-server-2.primary 172.20.0.14:8302 alive server 1.21.4 2 primary default <all>
consul-server-0.secondary 172.20.0.5:8302 alive server 1.21.4 2 secondary default <all>
consul-server-1.secondary 172.20.0.4:8302 alive server 1.21.4 2 secondary default <all>
consul-server-2.secondary 172.20.0.8:8302 alive server 1.21.4 2 secondary default <all>
To confirm that ACL replication is still active in the secondary datacenter, use the /v1/acl/replication REST endpoint on any Consul agent on the secondary datacenter. The command in the following example uses one of the secondary datacenter servers.
$ curl -s -H "X-Consul-Token: $CONSUL_HTTP_TOKEN" https://consul-server-0.secondary/v1/acl/replication?pretty
{
"Enabled": true,
"Running": true,
"SourceDatacenter": "primary",
"ReplicationType": "tokens",
"ReplicatedIndex": 627,
"ReplicatedRoleIndex": 1,
"ReplicatedTokenIndex": 631,
"LastSuccess": "2025-09-15T17:16:28Z",
"LastError": "0001-01-01T00:00:00Z"
}
You must send the request to a secondary datacenter because the primary datacenter always appears as having replication disabled, even when replication is happening.
Next steps
This upgrade process can be safely applied to your production datacenter without the risk of downtime. However, we still recommend that you test any disruptive operation, such as an upgrade, on a test environment that replicates the existing production environment.
To simplify the upgrade process, Enterprise users can leverage Automated Upgrades feature of Consul's autopilot.