Vault: Migrate the storage backend from DynamoDB to Integrated Storage (Raft)
HashiCorp Products | Vault Enterprise, Vault Community Edition |
---|---|
Partner Products | AWS Dynamodb, AWS EC2 |
Maturity Model | Standardize |
Use Case Coverage | Migrating Vault backends |
Tags | Vault |
Guide Type | HVD Integration Guide |
Publish Date, Version | August/30/2024, Version 1.0.0 |
Authors | Mustafa EL-Hilo |
Purpose of this guide
This document provides guidance and instructions on migrating from Vault Community Edition backed by AWS DynamoDB to Vault Enterprise with integrated storage (Raft) deployed in AWS. The guide equips you with the knowledge and tools necessary to understand and address the challenges associated with a Vault storage migration. These steps are intended for practice in a non-production environment first, allowing you to develop a customized migration plan for production use.
Additionally, through this guide you will be able to gain confidence in sizing your new Vault Enterprise cluster compared to your current cluster.
Target audience
- Platform Team (Vault infrastructure owners)
Benefits
Utilizing Vault integrated storage provides the following benefits:
- The Raft consensus algorithm ensures high availability and data replication across multiple nodes, providing a robust and resilient storage backend
- Integrated storage is HashiCorp Supported whereas DynamoDB is only community supported
- Integrated storage is a "built-in" storage option that supports backup/restore workflows, high availability, and Enterprise replication features without relying on third-party systems
- Autopilot
- Automated upgrades
- Vault Enterprise replication
Prerequisites and limitations
Before starting this guide, we recommend that you review the following:
Prerequisites - documentation
- Deploying Vault in AWS on EC2 Using Terraform
- Vault Solution Design Guide | Core Design Requirements
- Design your Vault Enterprise cluster
- operator migrate - Command | Vault | HashiCorp Developer
- Benchmark Vault performance
Prerequisites - setup
- An active AWS account
- Vault cluster with DynamoDB storage backend
- Vault Enterprise license
Integration architecture
The main components of this integration are:
- Vault cluster A
- Binary: Vault Community Edition
- Storage Backend: AWS DynamoDB
- Vault cluster B
- Binary: Vault Enterprise
- Storage backend: Integrated storage (Raft)
- IAM Access to DynamoDB used by Vault cluster A
Note
Vault must not be initialized or started
Sizing the destination Vault cluster B to match the performance of the current Vault cluster isn’t trivial. Utilize the Vault: Solution Design Guide as a starting point and execute performance benchmark tests prior to going to production to ensure business continuity and a smooth transition.
Since you are migrating to integrated storage you will require larger sized EC2 instances than what is currently running Vault with a DynamoDB backend. This is because additional IO and compute will be required to account for running integrated storage on the EC2.
The following are the high level steps taken during the migration, for simplicity a 3 node cluster is shown.
The migration begins with a Vault cluster that interacts with DynamoDB for read and write operations.
Shut down Vault cluster A to prepare to migrate the data from DynamoDB to the new Vault cluster. The migration operator will move the data from DynamoDB to the disk file path on node 1. Notice that the data only goes to the first node while Vault itself is still not up.
In the end state, Vault cluster B is configured to replicate data from Node 1 to the other two nodes within the cluster. All nodes maintain a consistent state, ensuring that any data written to Node 1 is immediately available on Node 2 and Node 3.
Additional considerations:
Note
Ensure the Vault version between the two clusters is the same. Upgrading Vault should be done as a separate task (either before or after this migration).Note
Ensure your Vault storage migration plan accounts for your seal/unseal process. For example, if you are using AWS KMS for auto-unseal you need to grant the Vault cluster B access to AWS KMS.Best practices
People and process
The Vault: Operating Guide for Adoption outlines the various teams and responsibilities required to operate Vault successfully. For this task, we are primarily interested in the platform team that operates and manages Vault infrastructure.
Performing a Vault storage migration requires coordination with all the other teams, as it will require downtime during the migration process.
Object planning considerations
Due to the disruptive nature of the migration process it’s paramount that the platform team exercises caution. Use this guide as a way to practice on a non-production cluster first to establish baseline benchmark measurements and practice the steps. By utilizing benchmarking you will gain confidence that you are sizing the new cluster correctly.
Once you have a good understanding of the steps involved, write a migration plan that works for your team. Communicate your migration plan to all stakeholders. More importantly, practice the migration in a safe environment. Repeat the migration until the team has a high level of confidence. Your migration plan should account for:
- Stakeholders that need to be notified
- Timeline and schedule for the migration
- Downtime and maintenance window requirements
- Vault Storage migration steps and procedure
- A roll-back plan
- Testing and validation steps for the migration process
- Verification of post-migration system integrity
Checklist
- Set up a non-production environment that is a clone of the current production environment
- Set up EC2s with Enterprise Vault binary, these will later make up Vault cluster B. Use the HVD: Deploying Vault in AWS on EC2 Using Terraform as a reference for your design and configuration.
- Create an EC2 to be used for
vault-benchmark
that can access the Vault cluster being migrated - Vault Enterprise license
- Observability and monitoring has been configured for the Vault cluster being migrated
Warning
All the following steps are to be completed on non-production resources first. Once the steps are understood they will need to be adjusted to your own situation. These steps assume you have created a Vault cluster of identical (or similar) properties to what you currently have in Production.Step 1: Benchmark Vault cluster A
Note
Ensure that [observability and monitoring](https://www.hashicorp.com/blog/hashicorp-vault-observability-monitoring-vault-at-scale) is completed for Vault cluster A to be able see how Vault is handling this additional load.Note
For optimal results, run `vault-benchmark` on a dedicated host and not on a Vault node.This step is done to create a baseline performance understanding of the current cluster. The same benchmark tests should be run here and later on Vault cluster B.
Task: Install vault-benchmark
Persona: Platform engineer
Description:
To begin, we recommend measuring the performance of the existing Vault cluster before migrating the storage backend. This measured approach enables you to determine if your migration was ultimately a success and establish a baseline to measure against.
To measure the performance of a Vault cluster, we recommend using Vault Benchmark, a tool developed and maintained by HashiCorp.
Follow steps outlined in vault-benchmark source to install on the dedicated EC2 for benchmarking.
Task: Configure and Run vault-benchmark
Persona: Platform engineer
Description:
Here we will create a Vault Benchmark configuration. Use the sample configuration as a starting point and add additional tests as needed. Your aim should be to create a substantial load on this Vault cluster and push it to its limit. We don’t believe you have to test all auth method types and secret engines you currently utilize and rather focus the tests on pushing the cluster to its performance limit.
The goal here is to establish a baseline of performance to measure the new cluster against, so both clusters must utilize the same set of tests. You’ll eventually need to create two configurations, but for now our focus is to benchmark Vault cluster A.
Running Vault Benchmark will result in Vault secrets and auth methods being created. Vault Benchmark does some level of cleanup but not for all tests. The duration of the Vault Benchmark is defined by the duration
field in the configuration, during which Vault Benchmark will create the auth methods and secrets defined splitting them based on the weight
.
For example: Create a file named config_cluster_a.hcl
with
vault_addr = "http://<VAULT CLUSTER A HOSTNAME / IP>:8200"
vault_token = "root"
vault_namespace="root"
duration = "30s"
cleanup = true
test "approle_auth" "approle_logins" {
weight = 50
config {
role {
role_name = "benchmark-role"
token_ttl="2m"
}
}
}
test "kvv2_write" "static_secret_writes" {
weight = 50
config {
numkvs = 100
kvsize = 100
}
}
Then run the binary with the configuration path: vault-benchmark run -config=config_cluster_a.hcl
Task: Capture and store results vault-benchmark
Persona: Platform engineer
Description:
Capture the benchmark tests
| op | count | rate | throughput | mean | 95th% | 99th% | successRatio |
|----------------------|-------|-----------|------------|-------------|-------------|--------------|--------------|
| approle_logins | 2619 | 87.350307 | 87.195473 | 61.307741ms | 78.387069ms | 94.788229ms | 100.00% |
| static_secret_writes | 2536 | 84.530378 | 84.369422 | 55.059506ms | 82.072336ms | 109.441173ms | 100.00% |
Step 2: Prepare Vault cluster A
Task: Stop Vault cluster A
Persona: Platform engineer
Description:
Stop Vault process on all nodes. The purpose of this step is to stop read and write operations to DynamoDB during the migration process.
Depending on how Vault is running, if using systemctl
: sudo systemctl stop vault
Ensure Vault has stopped:
- On the server run
vault status
and you should get back aconnection refused
- Check Vault logs for
... [INFO] core: cluster listeners successfully shut down
Task: Create DynamoDB backup
Persona: Platform engineer
Description:
- Follow AWS’s documentation to create a DynamoDB backup
Step3: Create and run Vault migration configuration on Vault cluster B
Task: Verify Vault cluster B status
Persona: Platform engineer
Description:
Note
Vault cluster B should not be initialized or started.Follow your preferred method to install the Enterprise Vault binary. The Vault enterprise binary version should match the current Vault version in Vault cluster A.
Alternatively follow the HVD: Deploying Vault in AWS on EC2 Using Terraform, ensure not to start Vault or initialize it, this will require modifying the initialization script provided.
Task: Create Vault migration configuration
Persona: Platform engineer
Description:
We are now ready to migrate the data from DynamoDB to one of the EC2s that will eventually be part of Vault cluster B.
- Pick a node in Vault cluster B and create a file
migrate.hcl
in/etc/vault.d/
- Configuration details for
storage_source
andstorage_destination
can be found in their respective storage configuration. In this case it’s DynamoDB and integrated storage.
For example, you’ll need to replace ha_enabled
, region
, table
, node_id
and cluster_addr
:
storage_source "dynamodb" {
ha_enabled = "<true or false>"
region = "<REGION>"
table = "<DYNAMODB TABLE NAME>"
}
storage_destination "Raft" {
path = "/opt/vault/data"
node_id = "<Hostname / IP of current node>"
}
cluster_addr = "https://<Hostname / IP of current node>:8201"
Task: Run Vault migration operator
Persona: Platform engineer
Description:
Note
The same user should be used to run this command as the user that will later run Vault. As outlined in the [Vault: Solution Design Guide](https://developer.hashicorp.com/validated-designs/vault-solution-design-guides-vault-enterprise/deploying-vault-private-datacenter#install-vault), we will use the `vault` user.- To start the migration as the
vault
user, we will usesudo -u vault
as follows:
sudo -u vault vault operator migrate -config /etc/vault.d/migrate.hcl
- Note that this command can take time to complete depending on how much data is in Vault. Use
tmux
orscreen
to run the command and detach from your session. Note the amount of time this migration takes as you’ll need to account for it in your migration plan. - Once migration is done you should see a message:
Success! All of the keys have been migrated.
Step 4: Standup Vault cluster B
Vault configurations vary depending on the features enabled. Your next step here will depend on what you have enabled in the Vault configuration. You only need to perform one of the following 2 tasks.
Task: If using retry_join
, add nodes to Vault cluster B
Persona: Platform engineer
Description:
Start Vault on all nodes in cluster B. It’s safe to delete the
migrate.hcl
file on the node and proceed with the final Vault configuration for all the other nodes.Unseal Vault on all nodes
- If using Shamir seal use the same keys from Vault cluster A
- If using AWS KMS auto unseal then Vault would be unsealed after it’s started. Check that AWS KMS IAM permissions are granted to the EC2 instances. Utilize
vault status
to check.
Check if all Vault nodes join the Raft cluster
vault operator raft list-peers
The output should look similar to the following:$ vault operator raft list-peers Node Address State Voter ---- ------- ----- ----- vault_1 10.0.1.21:8201 leader true vault_2 10.0.2.18:8201 follower true vault_3 10.0.3.236:8201 follower true vault_4 10.0.1.32:8201 follower false vault_5 10.0.2.39:8201 follower false vault_6 10.0.3.226:8201 follower false
Check logs for any issues if the nodes do not appear in
list-peer
Task: if not using retry_join
, add nodes to Vault cluster B
Persona: Platform engineer
Description:
Start Vault on all nodes in cluster B. It’s safe to delete the
migrate.hcl
file on the node and proceed with the final Vault configuration for all the other nodes.Add Vault nodes to Raft storage cluster
vault operator raft join http://<Hostname / IP of node to add>:8200
Unseal Vault on all nodes
- If using Shamir seal use the same keys from Vault cluster A
- If using AWS KMS auto unseal then Vault would be unsealed after it’s started. Check that AWS KMS IAM permissions are granted to the EC2 instances. Utilize
vault status
to check.
Check if all Vault nodes added to Raft cluster
vault operator raft list-peers
The output should look similar to the following:$ vault operator raft list-peers Node Address State Voter ---- ------- ----- ----- vault_1 10.0.1.21:8201 leader true vault_2 10.0.2.18:8201 follower true vault_3 10.0.3.236:8201 follower true vault_4 10.0.1.32:8201 follower false vault_5 10.0.2.39:8201 follower false vault_6 10.0.3.226:8201 follower false
Check logs for any issues if the nodes do not appear in
list-peer
Task: Basic health checks
Persona: Platform engineer
Description:
- Ensure the logs for all Vault nodes don’t contain any errors
- Log into Vault UI and you should see the same configuration, secrets, and auth methods as Vault cluster A.
Once the primary cluster is confirmed to be healthy and functional you can go ahead and add secondary clusters (performance replication or disaster recovery).
Step 5: Benchmark Vault cluster B
Note
Ensure that observability and monitoring is completed for Vault cluster B to be able to see how Vault is handling this additional load. Utilize the [HVD: Visibility & Management](https://developer.hashicorp.com/validated-designs/vault-operating-guides-adoption/visibility-and-management) and [HashiCorp Vault observability: Monitoring Vault at scale](https://www.hashicorp.com/blog/hashicorp-vault-observability-monitoring-vault-at-scale). We are primarily interested in performance metrics at this point.Task: Configure and Run vault-benchmark
Persona: Platform engineer
Description:
Switch back to the EC2 instance used in Step 1 and copy the configuration created to a new file config_cluster_b.hcl
and modify vault_addr
to the Vault cluster B hostname / IP. Then run the binary with the configuration path:
For example: vault-benchmark run -config=config_cluster_a.hcl
Task: Compare results to Step 1
Persona: Platform engineer
Description: Once the Vault Benchmark is completed, compare the results obtained in Step 1 and confirm that performance is similar or better. If the results are not satisfactory, EC2 sizing should be adjusted.
Conclusion
Migrating from Vault Community Edition with an AWS DynamoDB backend to Vault Enterprise with integrated storage (Raft) provides several benefits, including enhanced performance, high availability, and HashiCorp-supported features. This guide equips you with the knowledge and tools necessary to understand and address a storage migration.
By following this guide, you should gain an understanding of the migration process, allowing you to develop a customized migration plan for your organization. Practicing the migration in a non-production environment and utilizing Vault Benchmark to create a performance baseline will increase your confidence in the success of the storage migration. This measured approach ensures that your production environment remains stable and that performance goals are met.
The migration itself, while not overly complex, requires planning and execution to minimize risks and ensure a smooth transition. Focus on thorough preparation, stakeholder communication, and repeated practice to reduce the risk of failure or suboptimal performance.
Related resources
Documentation & tutorials
- Deploying Vault in AWS on EC2 Using Terraform
- Vault Solution Design Guide | Core Design Requirements
- Design your Vault Enterprise cluster
- operator migrate - Command | Vault | HashiCorp Developer
- Benchmark Vault performance
- Preflight checklist - migrating to integrated storage | Vault | HashiCorp Developer
- https://developer.hashicorp.com/vault/docs/commands/operator/raft