• HashiCorp Cloud Platform
    • Terraform
    • Packer
    • Consul
    • Vault
    • Boundary
    • Nomad
    • Waypoint
    • Vagrant
  • Sign up
Reference Architecture

Well-Architected Framework

Skip to main content
  • Vault with Consul Storage Reference Architecture
  • Vault Multi-Cluster Architecture Guide
  • Vault with Integrated Storage Reference Architecture

  • Resources

  • Tutorial Library
  • Community Forum
    (opens in new tab)
  • Support
    (opens in new tab)
  • GitHub
    (opens in new tab)
  1. Developer
  2. Well-Architected Framework
  3. Vault
  4. Vault with Integrated Storage Reference Architecture

Vault with Integrated Storage Reference Architecture

  • 15min

  • VaultVault

This guide applies to Vault versions 1.7 and above.

This guide describes recommended best practices for infrastructure architects and operators to follow when deploying Vault using the Integrated Storage (Raft) storage backend in a production environment.

This guide includes general guidance as well as specific recommendations for popular cloud infrastructure platforms. These recommendations have also been encoded into official Terraform modules for AWS, Azure, and GCP.

NOTE: If you are deploying Vault to Kubernetes, please refer to the Vault on Kubernetes Reference Architecture.

Recommended architecture

The following diagram shows the recommended architecture for deploying a single Vault cluster with maximum resiliency:

Recommended architecture diagram

With five nodes in the Vault cluster distributed between three availability zones, this architecture can withstand the loss of two nodes from within the cluster or the loss of an entire availability zone.

If deploying to three availability zones is not possible, the same architecture may be used across two or one availability zones, at the expense of significant reliability risk in case of an availability zone outage.

For Vault Enterprise customers, additional resiliency is possible by implementing a multi-cluster architecture, which allows for additional performance and disaster recovery options. See the Multi-Cluster Architecture Guide for more information.

System requirements

This section contains specific hardware capacity recommendations, network requirements, and additional infrastructure considerations. Since every hosting environment is different and every customer's Vault usage profile is different, these recommendations should only serve as a starting point from which each customer's operations staff may observe and adjust to meet the unique needs of each deployment.

Warning: All specification outlined in this document are minimum recommendations without any reservations toward vertical scaling, redundancy or other SRE needs and without measure of your user volumes or their use-cases in all scenarios. All resource requirements are directly proportional to the operations being performed by the Vault cluster as well as the end users utilisation.

Note: To match your requirements and maximise the stability of your Vault instances, it's important to ensure that you are performing load tests and continuing to monitor resource usage as well as all reported matricies from Vaults telemetry.

Hardware sizing for Vault servers

Sizing recommendations have been divided into two common cluster sizes.

Small clusters would be appropriate for most initial production deployments or for development and testing environments.

Large clusters are production environments with a consistently high workload. That might be a large number of transactions, a large number of secrets, or a combination of the two.

SizeCPUMemoryDisk CapacityDisk IODisk Throughput
Small2-4 core8-16 GB RAM100+ GB3000+ IOPS75+ MB/s
Large4-8 core32-64 GB RAM200+ GB10000+ IOPS250+ MB/s

For each cluster size, the following table gives recommended hardware specs for each major cloud infrastructure provider.

ProviderSizeInstance/VM TypesDisk Volume Specs
AWSSmallm5.large, m5.xlarge100+GB gp3, 3000 IOPS, 125MB/s
Largem5.2xlarge, m5.4xlarge200+GB gp3, 10000 IOPS, 250MB/s
AzureSmallStandard_D2s_v3, Standard_D4s_v31024GB* Premium_LRS
LargeStandard_D8s_v3, Standard_D16s_v31024GB* Premium_LRS
GCPSmalln2-standard-2, n2-standard-4500GB* pd-balanced
Largen2-standard-8, n2-standard-161000GB* pd-ssd

NOTE: For GCP and Azure recommendations, the disk sizes listed are larger than the minimum size recommended, because for the recommended disk type, available IOPS increases with disk capacity, and the listed sizes are necessary to provision the required IOPS.

NOTE: For predictable performance on cloud providers, it's recommended to avoid "burstable" CPU and storage options (such as AWS t2 and t3 instance types) whose performance may degrade rapidly under continuous load.

NOTE: The internal database that Vault uses is optimized for modern SSD drives. Running Vault on magnetic spinning disks will incur a dramatic performance penalty.

Hardware considerations

In general, CPU and storage performance requirements will depend on the customer's exact usage profile (eg, types of requests, average request rate, and peak request rate). Memory requirements depend on the total size of data stored in memory and should be sized according to that data.

When using Integrated Storage the Vault servers should have a relatively high-performance hard disk subsystem. If many secrets are being generated or rotated frequently, this information will need to flush to disk often and the use of slower storage systems will significantly impact performance.

In addition, Hashicorp strongly recommends configuring Vault with audit logging enabled. The impact of the additional storage I/O from audit logging will vary depending on your particular pattern of requests. For best performance, audit logs should be written to a separate disk.

Network latency and bandwidth

In order for cluster members to stay properly in sync, network latency between availability zones should be less than eight milliseconds (8 ms).

The amount of network bandwidth used by Vault will depend entirely on the specific customer's usage patterns. In many cases, even a high request volume will not translate to a large amount of network bandwidth consumption. However, all data written to Vault will be replicated to all cluster members. It's also important to consider bandwidth requirements to other external systems such as monitoring and logging collectors. And finally, a multi-cluster Vault setup will require Vault datasets to be transmitted between clusters to provide Performance and DR Replication.

Network connectivity

The following table outlines the network connectivity requirements for Vault cluster nodes. If general network egress is restricted, particular attention must be paid to granting outgoing access from the Vault servers to any external integration providers (for example, authentication and secret provider backends) as well as external log handlers, metrics collection, security and config management providers, and backup and restore systems.

SourceDestinationportprotocolDirectionPurpose
Client machinesLoad balancer443tcpincomingRequest distribution
Load balancerVault servers8200tcpincomingVault API
Vault serversVault servers8200tcpbidirectionalCluster bootstrapping
Vault serversVault servers8201tcpbidirectionalRaft, replication, request forwarding
Vault serversExternal systemsvariousvariousvariousExternal APIs

Network traffic encryption

All Vault-related network traffic should be encrypted along every segment. From client machines to the load balancer, and from the load balancer to the Vault servers, standard HTTPS TLS encryption can be used.

For communication between Vault servers (port 8201 by default) including Raft gossip, data replication, and request forwarding traffic, Vault automatically negotiates an mTLS connection when new servers join the cluster initially via the API address port (8200 by default).

Load balancer recommendations

For the highest levels of reliability and stability, it is highly recommended to use some load balancing technology to distribute requests to your Vault cluster members. Each major cloud platform provides good options for managed load balancer services, or there are a number of self-hosted options as well as service discovery systems like Consul.

If you choose to terminate TLS at your load balancer, it is also strongly recommended to use TLS for the connection from the load balancer to Vault as well to minimize the exposure of secret content on your network.

To monitor the health of Vault cluster nodes, the load balancer should be configured to poll the /v1/sys/health API endpoint to detect the status of the node and direct traffic accordingly. Refer to the sys/health API documentation for specific details on the query options and response codes and their meanings.

Scaling considerations

As of Vault 1.7, in a cloud-based environment, it is recommended to use a managed scaling service (such as Auto Scaling Groups on AWS) to keep your Vault cluster populated with healthy instances. However, because of the nature of the Integrated Storage backend, it's important not to replace all instances in the managed scaling group too quickly to avoid having to restore data from a snapshot.

NOTE: Auto-server cleanup is not enabled by default when using Integrated Storage. The feature must be enabled after cluster initialization via the Raft Autopilot API. Also see the Integrated Storage Autopilot Tutorial for more details.

For scaling the performance of your Vault cluster, there are two factors to consider. Adding additional members to the Vault cluster will not increase performance for any activity that triggers writes to the Vault storage backend. However, for Vault Enterprise customers, adding performance standby nodes can provide horizontal scalability for read requests within a Vault cluster.

Failure tolerance characteristics

When deploying a Vault cluster, it's important to consider and design for your specific requirements for various failure scenarios:

Node failure

The Integrated Storage backend for Vault allows for individual node failure by replicating all data between each node of the cluster. If the leader node fails, the remaining cluster members will elect a new leader following the Raft protocol. To allow for the failure of up to two nodes in the cluster, the ideal size is five nodes for a Vault cluster using Integrated Storage.

Availability zone failure

By deploying a Vault cluster in the recommended architecture across three availability zones, the Raft consensus algorithm should be able to maintain consistency and availability given the failure of any one availability zone.

In cases where deployment across three zones is not possible, the failure of an availability zone may cause the Vault cluster to become inaccessible or unable to elect a leader. In a two availability zone deployment, for example, the failure of one availability zone would have a 50% chance of causing a cluster to lose its Raft quorum and be unable to service requests.

Region or cluster failure

In the event of a failure of an entire region or cluster, Vault Enterprise provides replication features that can help provide resiliency across multiple clusters and/or regions. Please see the Multi-Cluster Architecture Guide for more information.

External token storage

The Tokenization transformation feature reached General Availability in Vault 1.7. This feature introduces additional architectural considerations.

The tokenization feature requires an external data store to facilitate the mapping of tokens to cryptographic values. Be sure to architect your external data stores for high availability. Where possible, it's important to follow reliability and disaster-recovery architectural patterns that meet the same requirements you have for Vault itself. And in order to ensure data consistency the external data store backup cadence must be in sync with backups of Vault.

Glossary

Vault cluster

A Vault cluster is a set of Vault processes that together run a Vault service. These Vault processes could be running on physical or virtual servers or in containers.

Availability zone

An availability zone is a single network failure domain that hosts part or all of a Vault cluster. Examples of availability zones include:

  • An isolated datacenter
  • An isolated cage in a datacenter if it is isolated from other cages by all other means (power, network, etc)
  • An "Availability Zone" in AWS or Azure; A "Zone" in GCP

Region

A region is a collection of one or more availability zones on a low-latency network. Regions are typically separated by significant distances. A region could host one or more Vault clusters, but a single Vault cluster would not be spread across multiple regions due to network latency issues.

Autoscaling

Autoscaling is the process of automatically scaling computational resources based on service activity. Autoscaling may be either horizontal, meaning to add more machines into the pool of resources, or vertical, meaning to increase the capacity of existing machines.

Each major cloud provider offers a managed autoscaling service:

CloudManaged Autoscaling Service
AWSAuto Scaling Groups
AzureVirtual Machine Scale Sets
GCPManaged Instance Groups

Load balancer

A load balancer is a system that distributes network requests across multiple servers. It may be a managed service from a cloud provider, a physical network appliance, a piece of software, or a service discovery platform such as Consul.

Each major cloud provider offers one or more managed load balancing services:

CloudLayerManaged Load Balancing Service
AWSLayer 4Network Load Balancer
Layer 7Application Load Balancer
AzureLayer 4Azure Load Balancer
Layer 7Azure Application Gateway
GCPLayer 4/7Cloud Load Balancing

Next steps

  • Vault Multi-Cluster Architecture Guide

  • Vault with Integrated Storage Deployment Guide

  • Vault Production Hardening Guide

Additional references

  • Vault internal architecture documentation

  • Integrated Storage reference

 Previous
 Browse Tutorials

This tutorial also appears in:

  •  
    12 tutorials
    Integrated Storage
    Operational tasks associated with integrated storage to persist Vault data rather than using external storage.
    • Vault
  •  
    12 tutorials
    Deploy Cluster with Integrated Storage
    If you are responsible for setting up and maintaining a Vault cluster using integrated storage as a persistence layer, get started here.
    • Vault

On this page

  1. Vault with Integrated Storage Reference Architecture
  2. Recommended architecture
  3. System requirements
  4. Failure tolerance characteristics
  5. External token storage
  6. Glossary
  7. Next steps
  8. Additional references
Give Feedback(opens in new tab)
  • Certifications
  • System Status
  • Terms of Use
  • Security
  • Privacy
  • Trademark Policy
  • Trade Controls
  • Give Feedback(opens in new tab)