Consul Multi-Cluster Disaster Recovery Considerations
This guide applies to Consul versions 1.8 - 1.10.
The disaster recovery considerations for Consul single cluster deployments apply to Consul multi-cluster deployments, however, there a few additional considerations that are specific to Consul multi-cluster deployments.
When you design and architect your Consul environment, it is important to consider the critical role of the primary datacenter within the multi-cluster deployment. The primary Consul datacenter serves as the ‘source of truth’ for the following data.
- Certificate Authority management, if you use the built-in Consul CA. The root CA resides in the primary Consul datacenter and must sign the certificates for the additional Consul datacenters.
Therefore, it is important to consider both placement of the primary Consul datacenter as well as the steps required to recover from a disaster. The recommended approach is reviewed in detail below.
Clientless primary Consul datacenter
Once you establish and federate a primary Consul datacenter; you cannot migrate, change, or move it. An effective pattern for large Consul multi-cluster deployments is to have a dedicated primary Consul datacenter with the sole purpose of serving as a primary. You would only include Consul servers in this primary datacenter and not connect any client nodes or services. This primary Consul datacenter can then be federated normally with other Consul datacenters, which will each contain both servers and clients.
This approach provides two distinct advantages.
- It becomes easier to move the primary Consul datacenter. For example, you may want to migrate it from an on premises datacenter to a cloud environment. Typically, this would entail performing a backup and restore of the primary Consul datacenter to the alternate location. Review the Disaster Recovery for the Primary Datacenter tutorial for guidance on restoring a Consul cluster.
- In the event of a disaster, the additional Consul datacenters can still continue to function independently of the primary Consul datacenter although functionality will be reduced until the primary Consul datacenter is brought back online. See the table below for more details.
Primary Consul datacenter outage behaviors
The table below assumes that the primary Consul datacenter is offline. It is implied that when referencing 'any Consul datacenter' that the primary Consul datacenter is not included.
|Consul Cluster Functionality||Within local Consul datacenter||Within any Consul datacenter||Comments|
|Read ACLs||✔||✔||Assumes that the default setting of ‘extend cache’ is used for the ACL down policy|
|Read Intentions||✔||✔||Assumes that Intentions were created when primary datacenter was online|
|Create/Read/Update/Delete KV Store items||✔||✔|
|Certificate Generation & Renewal||✖||✖||Certificates must be signed by the primary Consul datacenter|