Vault
Integrated Storage
Vault supports a number of storage options for the durable storage of Vault's information. As of Vault 1.4 an integrated storage option is offered. This storage backend does not rely on any third party systems, implements high availability semantics, supports Enterprise Replication features, and provides backup/restore workflows.
The integrated storage option stores Vault's data on the server's filesystem and uses a consensus protocol to replicate data to each server in the cluster. More information on the internals of integrated storage can be found in the integrated storage internals documentation. Additionally, the Configuration docs can help in configuring Vault to use integrated storage.
The sections below go into various details on how to operate Vault with integrated storage.
Cluster Membership
This section will outline how to bootstrap and manage a cluster of Vault nodes running integrated storage.
Integrated storage is bootstrapped during the initialization process, and results in a cluster of size 1. Depending on the desired deployment size, nodes can be joined to the active Vault node.
Joining Nodes
Joining is the process of taking an uninitialized Vault node and making it a member of an existing cluster. In order to authenticate the new node to the cluster it must use the same seal mechanism. If using a Auto Unseal the node must be configured to use the same KMS provider and Key as the cluster it's attempting to join. If using a Shamir seal the unseal keys must be provided to the new node before the join process can complete. Once a node has successfully joined, data from the active node can begin to replicate to it. Once a node has been joined it cannot be re-joined to a different cluster.
You can either join the node automatically via the config file or manually through the API (both methods described below). When joining a node, the API address of the leader node must be used. We recommend setting the api_addr configuration option on all nodes to make joining simpler.
retry_join Configuration
This method enables setting one, or more, target leader nodes in the config file. When an uninitialized Vault server starts up it will attempt to join each potential leader that is defined, retrying until successful. When one of the specified leaders become active this node will successfully join. When using Shamir seal, the joined nodes will still need to be unsealed manually. When using Auto Unseal the node will be able to join and unseal automatically.
An example retry_join config can be seen below:
storage "raft" {
path = "/var/raft/"
node_id = "node3"
retry_join {
leader_api_addr = "https://node1.vault.local:8200"
}
retry_join {
leader_api_addr = "https://node2.vault.local:8200"
}
}
Note, in each retry_join
stanza, you may provide a single leader_api_addr
or
auto_join
value. When a cloud auto_join
configuration value is provided, Vault
will use go-discover to automatically
attempt to discover and resolve potential Raft leader addresses.
See the go-discover
README for
details on the format of the auto_join
value.
storage "raft" {
path = "/var/raft/"
node_id = "node3"
retry_join {
auto_join = "provider=aws region=eu-west-1 tag_key=vault tag_value=... access_key_id=... secret_access_key=..."
}
}
By default, Vault will attempt to reach discovered peers using HTTPS and port 8200.
Operators may override these through the auto_join_scheme
and auto_join_port
fields respectively.
storage "raft" {
path = "/var/raft/"
node_id = "node3"
retry_join {
auto_join = "provider=aws region=eu-west-1 tag_key=vault tag_value=... access_key_id=... secret_access_key=..."
auto_join_scheme = "http"
auto_join_port = 8201
}
}
Join from the CLI
Alternatively you can use the join CLI command or the API to join a node. The active node's API address will need to be specified:
$ vault operator raft join https://node1.vault.local:8200
Non-Voting Nodes (Enterprise Only)
Nodes that are joined to a cluster can be specified as non-voters. A non-voting node has all of Vault's data replicated to it, but does not contribute to the quorum count. This can be used in conjunction with Performance Standby nodes to add read scalability to a cluster in cases where a high volume of reads to servers are needed.
$ vault operator raft join -non-voter https://node1.vault.local:8200
Removing Peers
Removing a peer node is a necessary step when you no longer want the node in the cluster. This could happen if the node is rotated for a new one, the hostname permanently changes and can no longer be accessed, you're attempting to shrink the size of the cluster, or for many other reasons. Removing the peer will ensure the cluster stays at the desired size, and that quorum is maintained.
To remove the peer you can issue a remove-peer command and provide the node ID you wish to remove:
$ vault operator raft remove-peer node1
Peer removed successfully!
Listing Peers
To see the current peer set for the cluster you can issue a list-peers command. All the voting nodes that are listed here contribute to the quorum and a majority must be alive for integrated storage to continue to operate.
$ vault operator raft list-peers
Node Address State Voter
---- ------- ----- -----
node1 node1.vault.local:8201 follower true
node2 node2.vault.local:8201 follower true
node3 node3.vault.local:8201 leader true
Server-to-Server Communication
Once nodes are joined to one another they begin to communicate using mTLS over
Vault's cluster port. The cluster port defaults to 8201
. The TLS information
is exchanged at join time and is rotated on a cadence.
A requirement for integrated storage is that the cluster_addr configuration option is set. This allows Vault to assign an address to the node ID at join time.
Outage Recovery
Quorum Maintained
This section outlines the steps to take when a single server or multiple servers are in a failed state but quorum is still maintained. This means the remaining alive servers are still operational, can elect a leader, and are able to process write requests.
If the failed server is recoverable, the best option is to bring it back online and have it reconnect to the cluster with the same host address. This will return the cluster to a fully healthy state.
If this is impractical, you need to remove the failed server. Usually, you can issue a remove-peer command to remove the failed server if it's still a member of the cluster.
If the remove-peer command isn't possible or you'd rather manually re-write the
cluster membership a raft/peers.json
file can be written to the configured
data directory.
Quorum Lost
In the event that multiple servers are lost, causing a loss of quorum and a complete outage, partial recovery is still possible.
If the failed servers are recoverable, the best option is to bring them back online and have them reconnect to the cluster using the same host addresses. This will return the cluster to a fully healthy state.
If the failed servers are not recoverable, partial recovery is possible using data on the remaining servers in the cluster. There may be data loss in this situation because multiple servers were lost, so information about what's committed could be incomplete. The recovery process implicitly commits all outstanding Raft log entries, so it's also possible to commit data that was uncommitted before the failure.
See the section below on manual recovery using peers.json for details of the recovery procedure. You include only the remaining servers in the raft/peers.json recovery file. The cluster should be able to elect a leader once the remaining servers are all restarted with an identical raft/peers.json configuration.
Any servers you introduce later can be fresh with totally clean data directories and joined using Vault's join command.
In extreme cases, it should be possible to recover with just a single remaining server by starting that single server with itself as the only peer in the raft/peers.json recovery file.
Manual Recovery Using peers.json
Using raft/peers.json for recovery can cause uncommitted Raft log entries to be implicitly committed, so this should only be used after an outage where no other option is available to recover a lost server. Make sure you don't have any automated processes that will put the peers file in place on a periodic basis.
To begin, stop all remaining servers.
The next step is to go to the configured data path of each Vault server. Inside that directory, there will be a raft/ sub-directory. We need to create a raft/peers.json file. The file should be formatted as a JSON array containing the node ID, address:port, and suffrage information of each Vault server you wish to be in the cluster.
[
{
"id": "node1",
"address": "node1.vault.local:8201",
"non_voter": false
},
{
"id": "node2",
"address": "node2.vault.local:8201",
"non_voter": false
},
{
"id": "node3",
"address": "node3.vault.local:8201",
"non_voter": false
}
]
id
(string: <required>)
- Specifies the node ID of the server. This can be found in the config file, or inside the node-id file in the server's data directory if it was auto-generated.address
(string: <required>)
- Specifies the host and port of the server. The port is the server's cluster port.non_voter
(bool: <false>)
- This controls whether the server is a non-voter. If omitted, it will default to false, which is typical for most clusters. This is an enterprise only feature.
Create entries for all servers. You must confirm that servers you do not include here have indeed failed and will not later rejoin the cluster. Ensure that this file is the same across all remaining server nodes.
At this point, you can restart all the remaining servers. The cluster should be in an operable state again. One of the nodes should claim leadership and become active.
Other Recovery Methods
For other, non-quorum related recovery Vault's recovery mode can be used.