• HashiCorp Cloud Platform
    • Terraform
    • Packer
    • Consul
    • Vault
    • Boundary
    • Nomad
    • Waypoint
    • Vagrant
  • Sign up
Reference Architecture

Well-Architected Framework

Skip to main content
  • Vault with Consul Storage Reference Architecture
  • Vault Multi-Cluster Architecture Guide
  • Vault with Integrated Storage Reference Architecture

  • Resources

  • Tutorial Library
  • Community Forum
    (opens in new tab)
  • Support
    (opens in new tab)
  • GitHub
    (opens in new tab)
  1. Developer
  2. Well-Architected Framework
  3. Vault
  4. Vault Multi-Cluster Architecture Guide

Vault Multi-Cluster Architecture Guide

  • 15min

  • EnterpriseEnterprise
  • VaultVault

This document applies to Vault Enterprise versions 1.6 - 1.8.

Vault Enterprise provides features for replicating data between Vault clusters for performance, availability, and disaster recovery purposes. In this tutorial, you will architect your Vault clusters according to HashiCorp recommended patterns and practices for replicating data.

Multiple region deployment

Resilience against region failure reference diagram

Reference Diagram Resilience against Region Failure

In this scenario, the clusters are replicated to guard against a full region failure. There is a primary cluster and two Performance Replica clusters (clusters A, B, and C), each with its own DR cluster (clusters A-DR, B-DR, and C-DR) in a remote region.

This architecture provides n-2 redundancy at the region level provided all secrets and secrets engines are replicated across all clusters. Failure of Region 1 would require the DR cluster A-DR to be promoted. Once this was done the Vault solution would be fully functional with some loss of redundancy until Region 1 was restored. Applications would not have to re-authenticate because DR clusters replicate all leases and tokens from their source cluster.

Resilience against cluster failure reference diagram

Reference Diagram Resilience against Cluster Failure

This solution has full resilience at a cluster level, but does not guard against region failure as the DR clusters are in the same region as their source clusters. There are some use cases where this is the preferred method where data cannot be replicated to other regions due to governance restrictions such as GDPR. Also some infrastructure frameworks may not have the ability to route application traffic to different regions.

Replication

Replication is a feature of Vault Enterprise only. See the Vault documentation for more detailed information on the replication capabilities of Vault Enterprise.

Performance replication

Vault performance replication allows for secrets management across many sites. Static secrets, authentication methods, and authorization policies are replicated to be active and available in multiple locations; however, leases, tokens and dynamic secrets are not.

NOTE: Refer to the Vault Mount Filter tutorial about filtering out secret engines from being replicated across regions.

Disaster recovery replication

Vault disaster recovery replication ensures that a standby Vault cluster is kept synchronized with an active Vault cluster. This mode of replication includes data such as ephemeral authentication tokens, time-based token information as well as token usage data. This provides for aggressive recovery point objective in environments where preventing loss of ephemeral operational data is of the utmost concern. In any enterprise solution, Disaster Recovery Replicas are considered essential.

NOTE: Refer to the Vault Disaster Recovery Setup tutorial for additional information.

Corruption or sabotage disaster recovery

Another common scenario to protect against, more prevalent in cloud environments that provide very high levels of intrinsic resiliency, might be the purposeful or accidental corruption of data and configuration, and/or a loss of cloud account control. Vault's DR Replication is designed to replicate live data, which would propagate intentional or accidental data corruption or deletion. To protect against these possibilities, you should implement regular backups of Vault's storage backend.

Backups with the Integrated Storage backend

You can take regular snapshots of Vault's Integrated Storage backend using the Raft Snapshot API. New infrastructure can be restored from a Raft snapshot. Refer to the Raft storage tutorial for step-by-step instructions.

Backups with the Consul storage backend

You can use the Consul Snapshot command to take snapshots of a Consul storage backend. These snapshots can be used to restore Consul storage to a new datacenter.

Replication notes

There is no set limit on the number of clusters within a replication set. The largest deployments today are in the 30+ cluster range. A Performance Replica cluster can have a Disaster Recovery cluster associated with it and can also replicate to multiple Disaster Recovery clusters.

While a Vault cluster can possess a replication role (or roles), there are no special considerations required in terms of infrastructure, and clusters can assume (or be promoted or demoted) to another role. Special circumstances related to mount filters and Hardware Security Module (HSM) usage may limit swapping of roles, but those are based on specific organization configurations.

Considerations related to HSM integration

Using replication with Vault clusters integrated with Hardware Security Module (HSM) or cloud auto-unseal devices for automated unseal operations has some details that should be understood during the planning phase.

  • If a performance primary cluster uses an HSM, all other clusters within that replication set should use an HSM or KMS as well.
  • If a performance primary cluster does NOT use an HSM (uses Shamir secret sharing method), the clusters within that replication set can be mixed, such that some may use an HSM, others may use Shamir.

The following diagram illustrates the seal type examples. Notice that the second pattern is technically achievable but not recommended.

Reference Diagram

This behavior is by design. A downstream Shamir cluster presents a potential attack vector in the replication group since a threshold of key holders could recreate the root key (previously known as master key); therefore, bypassing the upstream HSM key protection.

NOTE: As of Vault 1.3, the root key is encrypted with shared keys and stored on disk similar to how a root key is encrypted by the auto-unseal key and stored on disk. This provides consistent behavior whether you are using the Shamir's Secret Sharing algorithm or auto-unseal, and it allows all three scenarios above to be valid. However, if Vault is protecting data subject to governance and regulatory compliance requirements, it is recommended that you implement a downstream HSM for auto-unseal.

Next steps

  • Vault with Integrated Storage Deployment Guide

  • Vault with Consul Storage Deployment Guide

  • Vault Production Hardening Guide

 Previous
 Next

This tutorial also appears in:

  •  
    12 tutorials
    Deploy Cluster with Integrated Storage
    If you are responsible for setting up and maintaining a Vault cluster using integrated storage as a persistence layer, get started here.
    • Vault
  •  
    5 tutorials
    Deploy Cluster with Consul
    If you are responsible for setting up and maintaining a Vault cluster using Consul as storage backend, get started here.
    • Vault

On this page

  1. Vault Multi-Cluster Architecture Guide
  2. Multiple region deployment
  3. Replication
  4. Next steps
Give Feedback(opens in new tab)
  • Certifications
  • System Status
  • Terms of Use
  • Security
  • Privacy
  • Trademark Policy
  • Trade Controls
  • Give Feedback(opens in new tab)