Run a reliable Vault cluster
This document outlines implementation resources for maintaining reliable Vault cluster. When you implement proper reliability measures, you ensure high availability, fault tolerance, and consistent performance of your Vault infrastructure.
The following sections cover architecture, monitoring, resource management, recovery, and resilience.
Architecture
Learn about Vault Community Edition and Enterprise architecture and best practices to build reliabile Vault clusters.
- Vault reference architecture.
- Implement a robust Vault Enterprise cluster.
- Follow recommended patterns to keep your Vault cluster operating reliably.
Monitoring
Monitor Vault to collect telemetry data to view performance, audits, and infrastructure usage to ensure Vault is reliable.
- Telemetry metrics overview to lean about three types of telemetry metrics; counter, gauge, and summary.
- Monitor common Vault metrics such as core, usage, storage backend, audit, and resource.
- Use the Vault usage metrics dashboard in the Vault UI to filter usage metrics by namespace or auth methods.
- Enable Vault telemetry gathering to collect telemetry data from your Vault cluster.
- Use audit devices to collect detailed log of all requests to Vault, and their responses.
- Monitor the underlying infrastructure Vault runs on.
Resource management
Efficiently manage your Vault infrastructure, scaling, and performance.
- Enforce resource limits with Vault resource quotas and lease count quotas.
- Prevent lease explosions in your Vault cluster.
- Benchmark and measure Vault performance in environments which resemble production use cases to produce realistic results.
- Use Vault Enterprise Adaptive overload protection to prevent client requests from overwhelming different server resources.
Recovery
Recover Vault in the case of cluster degradation through the use of regular backups.
- Use the
-recovery
flag to bring Vault up in recovery mode. - Configure Vault Enterprise to take regular backups.
- Vault Enterprise performance replication provides consistency, scalability, and highly-available disaster recovery.
Resilience
Run a resilient Vault cluster to avoid application downtime.
- Run Vault in high availability (HA) mode to protect against outages by running multiple Vault servers.
- Run performance standby nodes.
- Use integrated storage for durable storage.
- Run Vault Enterprise redundancy zones to increase read scaling and resiliency.
- Use Vault Enterprise multi-datacenter replication for high availability and scalability through a primary/secondary (1:N) asynchronous model.
Next steps
In this document, you learned about the HashiCorp resources for implementing and running a reliable Vault cluster. The following are implementation guides on the other HashiCorp products.