Migrate Vault backend from DynamoDB to Integrated Storage

14min
|
Enterprise
Vault

Author: Mustafa EL-Hilo

This document provides guidance and instructions on migrating from Vault Community Edition backed by AWS DynamoDB to Vault Enterprise with integrated storage (Raft) deployed in AWS. The guide equips you with the knowledge and tools necessary to understand and address the challenges associated with a Vault storage migration. These steps are intended for practice in a non-production environment first, allowing you to develop a customized migration plan for production use.

Additionally, through this guide you will be able to gain confidence in sizing your new Vault Enterprise cluster compared to your current cluster.

Using Vault integrated storage provides the following benefits:

The Raft consensus algorithm ensures high availability and data replication across multiple nodes, providing a robust and resilient storage backend
Integrated storage is HashiCorp Supported whereas DynamoDB is only community supported
Integrated storage is a "built-in" storage option that supports backup/restore workflows, high availability, and Enterprise replication features without relying on third-party systems
Autopilot
Automated upgrades
Vault Enterprise replication

Target audience

This guide references the platform team — this team manages and operates Vault.

Prerequisites

To complete this guide, you will need:

An active AWS account
Vault cluster with DynamoDB storage backend
Vault Enterprise license

Before starting this guide, we recommend that you review the following:

Background and best practices

Due to the disruptive nature of the migration process, it is important that the platform team exercises caution. Use this guide as a way to practice on a non-production cluster first to establish baseline benchmark measurements and practice the steps. By utilizing benchmarking you will gain confidence that you are sizing the new cluster correctly.

Once you have a good understanding of the steps involved, write a migration plan that works for your team. Communicate your migration plan to all stakeholders. More importantly, practice the migration in a safe environment. Repeat the migration until the team has a high level of confidence. Your migration plan should account for:

Stakeholders that need to be notified
Timeline and schedule for the migration
Downtime and maintenance window requirements
Vault Storage migration steps and procedure
A roll-back plan
Testing and validation steps for the migration process
Verification of post-migration system integrity

Validated architecture

The main components of this integration are:

Vault cluster A (Vault CE)
- Storage Backend: AWS DynamoDB
Vault cluster B (Vault Enterprise)
- Storage backend: Integrated storage (Raft)
- IAM Access to DynamoDB used by Vault cluster A
  Note
  This Vault cluster must not be initialized or started

You will migrate from Vault CE with DynamoDB storage backend (cluster A) to Vault Enterprise with integrated storage (Raft) (cluster B).

Sizing the destination Vault cluster B to match the performance of the current Vault cluster is not trivial. Use the Vault: Solution Design Guide as a starting point and execute performance benchmark tests prior to going to production to ensure business continuity and a smooth transition.

Since you are migrating to integrated storage, you will require larger sized EC2 instances than what is currently running Vault with a DynamoDB backend. This is running integrated storage on EC2 requires additional IO and compute.

The following are the high level steps taken during the migration, for simplicity a 3 node cluster is shown.

The migration begins with a Vault cluster that interacts with DynamoDB for read and write operations.
Shut down Vault cluster A to prepare to migrate the data from DynamoDB to the new Vault cluster. The migration operator will move the data from DynamoDB to the disk file path on node 1. Notice that the data only goes to the first node while Vault itself is still not up.
In the end state, Vault cluster B is configured to replicate data from Node 1 to the other two nodes within the cluster. All nodes maintain a consistent state, ensuring that any data written to Node 1 is immediately available on Node 2 and Node 3.

Ensure the Vault version between the two clusters is the same. Upgrading Vault should be done as a separate task (either before or after this migration).

Ensure your Vault storage migration plan accounts for your seal/unseal process. For example, if you are using AWS KMS for auto-unseal you need to grant the Vault cluster B access to AWS KMS.

People and process considerations

The Vault: Operating Guide for Adoption outlines the various teams and responsibilities required to operate Vault successfully. For this task, we are primarily interested in the platform team that operates and manages Vault infrastructure.

Performing a Vault storage migration requires coordination with all the other teams, as it will require downtime during the migration process.

Workflow

The following are high level steps to complete the migration. We highly recommend you complete these steps on non-production resources first. Once the steps are understood they will need to be adjusted to your own situation. These steps assume you have created a Vault cluster of identical (or similar) properties to what you currently have in Production.

Set up a non-production environment that is a clone of the current production environment
Set up EC2s with Enterprise Vault binary, these will later make up Vault cluster B. Use the HVD: Deploying Vault using Terraform as a reference for your design and configuration.
Create an EC2 to be used for vault-benchmark that can access the Vault cluster being migrated
Vault Enterprise license
Monitoring has been configured for the Vault cluster being migrated

Install Vault benchmark

We recommend measuring the performance of the existing Vault cluster before migrating the storage backend. This measured approach lets you determine if your migration was ultimately a success and establish a baseline to measure against. To measure the performance of a Vault cluster, we recommend using Vault Benchmark, a tool developed and maintained by HashiCorp.

Follow steps outlined in vault-benchmark source to install on the dedicated EC2 for benchmarking.

Ensure that you have enabled monitoring for Vault cluster A so you can track how Vault is handling this additional load.

Benchmark Vault cluster A

The platform team should configure and run vault-benchmark on the dedicated EC2 for benchmarking.

Run vault-benchmark on a dedicated host separate from the Vault nodes to get accurate performance measurements. This establishes a baseline performance profile of the current cluster that you can compare against when benchmarking Vault cluster B later. Running the same benchmark tests on both clusters allows for direct performance comparisons.

Use the sample configuration as a starting point and add additional tests as needed. Your aim should be to create a substantial load on this Vault cluster and push it to its limit. You do not have to test all auth method types and secret engines you currently use. Rather, focus the tests on pushing the cluster to its performance limit.

Running Vault Benchmark will result in Vault secrets and auth methods being created. Vault Benchmark does some level of cleanup but not for all tests. The duration of the Vault Benchmark is defined by the duration field in the configuration, during which Vault Benchmark will create the auth methods and secrets defined splitting them based on the weight.

Create a file named config_cluster_a.hcl with the following content.

vault_addr = "http://<VAULT CLUSTER A HOSTNAME / IP>:8200"
vault_token = "root"
vault_namespace="root"
duration = "30s"
cleanup = true
test "approle_auth" "approle_logins" {
  weight = 50
  config {
    role {
      role_name = "benchmark-role"
      token_ttl="2m"
    }
  }
}
test "kvv2_write" "static_secret_writes" {
  weight = 50
  config {
    numkvs = 100
    kvsize = 100
  }
}

Then, run the binary with the configuration path:

$ vault-benchmark run -config=config_cluster_a.hcl

This should produce output similar to the following.

| op                   | count | rate      | throughput | mean        | 95th%       | 99th%        | successRatio |
|----------------------|-------|-----------|------------|-------------|-------------|--------------|--------------|
| approle_logins       | 2619  | 87.350307 | 87.195473  | 61.307741ms | 78.387069ms | 94.788229ms  | 100.00%      |
| static_secret_writes | 2536  | 84.530378 | 84.369422  | 55.059506ms | 82.072336ms | 109.441173ms | 100.00%      |

Prepare Vault cluster A

The platform team needs to stop Vault cluster A to stop read and write operations to DynamoDB during the migration process.

The actual command to stop Vault depends on how Vault is running. For example, if you are using systemctl, stop Vault with the following command:

$ sudo systemctl stop vault

Verify that Vault has stopped.

On the server, run vault status and you should get back a connection refused
Check Vault logs for ... [INFO] core: cluster listeners successfully shut down

Create DynamoDB backup

The platform team needs to create a DynamoDB backup. Follow AWS's documentation to create a DynamoDB backup.

Create and run Vault migration configuration on Vault cluster B

The platform team needs to verify that Vault cluster B status. This cluster should not be initialized or started.

Follow your preferred method to install the Enterprise Vault binary. The Vault enterprise binary version should match the current Vault version in Vault cluster A.

Alternatively follow the HVD: Deploying Vault using Terraform, ensure not to start Vault or initialize it, this will require modifying the initialization script provided.

The platform team needs to create a Vault migration configuration. This will migrate the data from DynamoDB to one of the EC2s that will eventually be part of Vault cluster B.

Pick a node in Vault cluster B and create a file migrate.hcl in /etc/vault.d/.
Configuration details for storage_source and storage_destination can be found in their respective storage configuration. In this case it's DynamoDB and integrated storage.

The following is an example migration configuration. You will need to replace ha_enabled, region, table, node_id and cluster_addr.

storage_source "dynamodb" {
 ha_enabled = "<true or false>"
 region     = "<REGION>"
 table      = "<DYNAMODB TABLE NAME>"
}

storage_destination "Raft" {
 path = "/opt/vault/data"
 node_id = "<Hostname / IP of current node>"
}

cluster_addr = "https://<Hostname / IP  of current node>:8201"

The platform team needs to run Vault migration operator.

Note

The same user should be used to run this command as the user that will later run Vault. As outlined in the Vault: Solution Design Guide, we will use the vault user.

To start the migration as the vault user, use sudo -u vault:
```
$ sudo -u vault vault operator migrate -config /etc/vault.d/migrate.hcl
```
Note that this command can take time to complete depending on how much data is in Vault. Use tmux or screen to run the command and detach from your session. Note the amount of time this migration takes since you will need to account for it in your migration plan.
Once migration is done, Vault will return the following message:
```
Success! All of the keys have been migrated.
```

Standup Vault cluster B

Vault configurations vary depending on the features enabled. Your next step here will depend on what you have enabled in the Vault configuration. You only need to perform one of the following 2 tasks.

The platform team needs to add nodes to Vault cluster B using retry_join.

Start Vault on all nodes in cluster B. It is safe to delete the migrate.hcl file on the node and proceed with the final Vault configuration for all the other nodes.
Unseal Vault on all nodes
- If using Shamir seal, use the same keys from Vault cluster A
- If using AWS KMS auto unseal, then Vault would be unsealed after it is started. Check that AWS KMS IAM permissions are granted to the EC2 instances. Use vault status to check.

Check if all Vault nodes join the Raft cluster

$ vault operator raft list-peers
Node          Address            State       Voter
----          -------            -----       -----
vault_1     10.0.1.21:8201      leader      true
vault_2     10.0.2.18:8201      follower    true
vault_3     10.0.3.236:8201     follower    true
vault_4     10.0.1.32:8201      follower    false
vault_5     10.0.2.39:8201      follower    false
vault_6     10.0.3.226:8201     follower    false

Check logs for any issues if the nodes do not appear in list-peer

The platform team needs to add nodes to Vault cluster B without using retry_join.

Start Vault on all nodes in cluster B. It's safe to delete the migrate.hcl file on the node and proceed with the final Vault configuration for all the other nodes.

Add Vault nodes to Raft storage cluster

$ vault operator raft join http://&lt;Hostname / IP of node to add>:8200

Unseal Vault on all nodes
- If using Shamir seal use the same keys from Vault cluster A
- If using AWS KMS auto unseal then Vault would be unsealed after it's started. Check that AWS KMS IAM permissions are granted to the EC2 instances. Use vault status to check.

Check if all Vault nodes added to Raft cluster

$ vault operator raft list-peers
Node          Address            State       Voter
----          -------            -----       -----
vault_1     10.0.1.21:8201     leader      true
vault_2     10.0.2.18:8201     follower    true
vault_3     10.0.3.236:8201    follower    true
vault_4     10.0.1.32:8201     follower    false
vault_5     10.0.2.39:8201     follower    false
vault_6     10.0.3.226:8201    follower    false

Check logs for any issues if the nodes do not appear in list-peer

The platform team needs to perform basic health checks.

Ensure the logs for all Vault nodes don't contain any errors
Log into Vault UI and you should see the same configuration, secrets, and auth methods as Vault cluster A.

Once the primary cluster is confirmed to be healthy and functional you can go ahead and add secondary clusters (performance replication or disaster recovery).

Benchmark Vault cluster B

Note

Ensure that observability and monitoring is completed for Vault cluster B to be able to see how Vault is handling this additional load. Utilize the [HVD: Observability & Management](/validated-designs/vault-operating-guides-adoption/observability-and-management) and [HashiCorp Vault observability: Monitoring Vault at scale](https://www.hashicorp.com/blog/hashicorp-vault-observability-monitoring-vault-at-scale). We are primarily interested in performance metrics at this point.

The platform team needs to configure and run vault-benchmark on the dedicated EC2 for benchmarking.

Copy the configuration created to a new file config_cluster_b.hcl and modify vault_addr to the Vault cluster B hostname and IP. Then, run the binary with the configuration path:

$ vault-benchmark run -config=config_cluster_b.hcl

Once the Vault Benchmark is completed, compare the benchmark results to the one created for cluster A. Confirm that performance is similar or better. If the results are not satisfactory, you may need to adjust the size of the EC2 in cluster B.

Conclusion

Migrating from Vault Community Edition with an AWS DynamoDB backend to Vault Enterprise with integrated storage (Raft) provides several benefits, including enhanced performance, high availability, and HashiCorp-supported features. This guide equips you with the knowledge and tools necessary to understand and address a storage migration.

By following this guide, you should gain an understanding of the migration process, allowing you to develop a customized migration plan for your organization. Practicing the migration in a non-production environment and utilizing Vault Benchmark to create a performance baseline will increase your confidence in the success of the storage migration. This measured approach ensures that your production environment remains stable and that performance goals are met.

The migration itself, while not overly complex, requires planning and execution to minimize risks and ensure a smooth transition. Focus on thorough preparation, stakeholder communication, and repeated practice to reduce the risk of failure or suboptimal performance.

For more information on migrating to integrated storage, refer to the following resources:

Secure secret management with Kubernetes

Define Vault policies with HCP Terraform