• HashiCorp Developer

  • HashiCorp Cloud Platform
  • Terraform
  • Packer
  • Consul
  • Vault
  • Boundary
  • Nomad
  • Waypoint
  • Vagrant
Vault
  • Install
  • Tutorials
  • Documentation
  • API
  • Integrations
  • Try Cloud(opens in new tab)
  • Sign up
Enterprise

Skip to main content
23 tutorials
  • Install a HashiCorp Enterprise License
  • Secure Multi-Tenancy with Namespaces
  • Vault Namespace and Mount Structuring Guide
  • Move Secrets Engines and Auth Methods Across Namespaces
  • Disaster Recovery Replication Setup
  • Disaster Recovery Replication Failover and Failback
  • Performance Standby Nodes
  • Setting up Performance Replication
  • Performance Replication with Paths Filter
  • Monitoring Vault Replication
  • Protecting Vault with Resource Quotas
  • Codify Management of Vault Enterprise Using Terraform
  • PKI Secrets Engine with Managed Keys
  • Sentinel Policies
  • Sentinel HTTP Import
  • Control Groups
  • Transform Secrets Engine
  • Tokenize Data with Transform Secrets Engine
  • KMIP Secrets Engine
  • Key Management Secrets Engine with Azure Key Vault
  • Key Management Secrets Engine with GCP Cloud KMS
  • HSM Integration - Seal Wrap
  • HSM Integration - Entropy Augmentation

  • Resources

  • Tutorial Library
  • Certifications
  • Community Forum
    (opens in new tab)
  • Support
    (opens in new tab)
  • GitHub
    (opens in new tab)
  1. Developer
  2. Vault
  3. Tutorials
  4. Enterprise
  5. Disaster Recovery Replication Failover and Failback

Disaster Recovery Replication Failover and Failback

  • 54min

  • EnterpriseEnterprise
  • VaultVault

Enterprise Feature: This tutorial covers Disaster Recovery Replication, a Vault Enterprise feature that requires a Vault Enterprise Standard license.

A disaster recovery (DR) strategy to protect your Vault deployment from catastrophic failure of an entire cluster helps reduce recovery efforts and minimize outage downtime. Vault Enterprise supports multi-datacenter deployments, so that you can replicate data across datacenters for improved performance and disaster recovery capabilities.

Challenge

When a disaster occurs, a Vault operator must be able to respond to the situation by performing failover from the affected cluster. Similarly, failing back to an original cluster state is typically required after you resolve the incident.

Solution

Vault Enterprise Disaster Recovery (DR) Replication features failover and failback capabilities to assist in recovery from catastrophic failure of entire clusters.

Learning to failover a DR replication primary cluster to a secondary cluster, and failback to the original cluster state is crucial for operating Vault in more than one datacenter.

Use the basic example workflow in this tutorial scenario to get acquainted with the steps involved in failing over and failing back using the Vault API, CLI, or UI.

Prerequisites

This intermediate Vault Enterprise operations tutorial assumes that you already have some working knowledge of operating Vault with the API, CLI, or web UI. If you aren't familiar with the Vault Enterprise Disaster Recovery replication functionality, you should review the Disaster Recovery Replication Setup tutorial before proceeding with this tutorial.

You also need the following resources to complete the tutorial hands-on scenario:

  • Docker installed.

  • Vault binary installed on your PATH for CLI operations. You must use a Vault Enterprise server throughout this tutorial, but you can use the OSS binary for all CLI examples.

  • curl to use the API command examples.

  • jq for parsing and pretty-printing JSON output.

  • A web browser for accessing the Vault UI.

Policy requirements

You must have a token with highly privileged policies, such as a root token to configure Vault Enterprise Replication. Some API endpoints also require the sudo capability.

If you aren't using the root token, expand the following example to learn more about the ACL policies required to perform the operations described in this tutorial.

# To enable DR primary
path "sys/replication/dr/primary/enable" {
  capabilities = ["create", "update"]
}

# To generate a secondary token required to add a DR secondary
path "sys/replication/dr/primary/secondary-token" {
  capabilities = ["create", "update", "sudo"]
}

# To create ACL policies
path "sys/policies/acl/*" {
  capabilities = ["create", "update", "list"]
}

# Create a token role for batch DR operation token
path "auth/token/roles/*" {
  capabilities = ["create", "update"]
}

# Create a token
path "auth/token/create" {
  capabilities = ["create", "update"]
}

# To demote the primary to secondary
path "sys/replication/dr/primary/demote" {
  capabilities = ["create", "update"]
}

# To enable DR secondary
path "sys/replication/dr/secondary/enable" {
  capabilities = ["create", "update"]
}

# To generate an operation token
path "sys/replication/dr/secondary/generate-operation-token/*" {
  capabilities = ["create", "update"]
}

# To promote the secondary cluster to be primary
path "sys/replication/dr/secondary/promote" {
  capabilities = ["create", "update"]
}

# To update the assigned primary cluster 
path "sys/replication/dr/secondary/update-primary" {
  capabilities = ["create", "update"]
}

# If you choose to disable the original primary cluster post-recovery
path "sys/replication/dr/primary/disable" {
  capabilities = ["create", "update"]
}

NOTE: If you aren't familiar with policies, complete the policies tutorial.

Scenario introduction

DR Prerequisites

To successfully follow this tutorial, you will deploy 2 single-node Vault Enterprise clusters with integrated storage:

  • Cluster A is the initial primary cluster.
  • Cluster B is the initial secondary cluster.

NOTE: The tutorial scenario uses single-node Vault clusters as a convenience to the learner and to simplify the deployment. For production Vault deployments, you should use highly available (HA) integrated storage described in the Vault with Integrated Storage Deployment Guide tutorial.

You will use these 2 clusters to simulate the following failover and failback workflows.

Failover to DR secondary cluster

Failover workflow diagram

In the current state, cluster A is the primary and replicates data to the secondary cluster B. You will perform the following actions to failover so that cluster B becomes the new primary cluster.

  1. Generate batch DR operation token on cluster A.
  2. Promote DR cluster B to become new primary.
  3. Demote cluster A to become secondary.
  4. Point cluster A to new primary cluster B.
  5. Test access to Vault data while cluster B is the primary.

Failback to original primary cluster

Failover workflow diagram

In the current state, cluster B is the primary and replicates data to the secondary cluster A. You will perform the following actions to failback to the original cluster replication state.

  1. Generate secondary token on cluster A.
  2. Promote cluster A.
  3. Demote cluster B.
  4. Point cluster B to cluster A, so cluster B is a DR secondary of cluster A.
  5. Test access to Vault data while cluster A is the primary cluster.

Prepare environment

The goal of this section is for you to prepare and deploy the Vault cluster containers.

You will start the Vault cluster Docker containers, and perform some initial configuration to ready the Vault clusters for replication.

  1. This tutorial requires a Vault Enterprise Standard license, so you need to first specify your license string as the value of the MY_VAULT_LICENSE environment variable.

    $ export MY_VAULT_LICENSE=C0FFEEU43BK5HGYYTOJZW2QNTNNNEWU33JJYZE26SNK52G2TLNJV22SNSZP2GWSNDYL2V2E3KNKRDGYTTKKV2E42TLGNH2IZZQLJCESNKNNJNXOSLJO5UVSN2WPJSEOOLULJNEUZTBK5IWST3JJE2U6V2WNVN2OVJSL2JTZ52ZNV2G2TCXJUYU2RCVORNVIWJUJVBTZ6CNNVNGQWJSLE2U42SCNRNXUSLJJRBUU4DCNZHDZWKXPBZVSWCSOBRDENLGN2LVC2KPN2EXCSLJO5UWCWCOPJS2OVTGNRDWY5C2KNETNSLKJ23U22SJORGUI23UJVKE4VKNKRKTNTL2IU3E22SBOVHGUVJQJ5KEK6KNKRRTGV3JJ2ZUS3SOGBNVQSRQLZZVE4DCK5KWST3JJ24U2RCJP2G2IQJVJRKEK6SWIRZXOT3KI23U62SBO5LWSSLTJ2WVNNDDI5WHSWKYKJYGENRVNZSEO3DULJJUSNSJNJEXOTLKKV2E2VCBORGUIRSVJVCECNSNIRZTNTKEIJQUS2LXN2SEOVTZNJLWY5KZLBJHZYRSGVTGIR3NORN2GSJWJ2VES52NNJKXITKUI22E2RCKKVGUIQJWJVCECNSNIRBGCSLJO5UWGSCKOZNEQVTKNRBUSNSJNZNGQZCXPZYES2LXN2NG26DILIZU22KPNZZWSYSXH2VWIV3YNRRXSSJWK54UU5DEK54DZYKTG2VVS6JRPJNTERTTLJJUS42JNVSHNZDNKZ4WE3KGOVNTEVLUNNDTS43BK5HDKSLJO5UVSV2SGJNVONLKLJLVC5C2I5DDZWKTG23WG3JZGBN2OTRQN2LTS5KJNQYTSZSRHU6S4RLNNVTE2WL2J5NWYV3NJJGUS52NIJLSWZ2GN55GUT2KKR2W443VO43XGWSVOJSGON22IVNUYVTGOZLUOSLRIRRE6WTNOVWHERKJIRJUO3KGIJGHE2TOOZSHI5DD25DVGTRWK4YUCNJRNRZWWY3BGRS2NNBTGN42Z53NKZWGC5SKKZ2HZSTYJ2ETSRBWKVDEYVLBKZIGU22XJJ2GGRBWOBQWYNTPJ5TEO3SLGJ52ZS2KKJWUOSCWGNSVU53RIZSSW3ZXNNXXGK2BKRHGQUC2N5JS6S2WL2TS6SZLNRDVZ52NG5VEE6CJG5DU6YLLGZKWC2LBJBXWK2ZQKJKG6NZSIRIT2PI
    

    NOTE: Be sure to use your Vault Enterprise license string value, and not the non-functional example value shown here.

  2. Export the environment variable HC_LEARN_LAB with a value that represents the lab directory, /tmp/learn-vault-lab.

    $ export HC_LEARN_LAB=/tmp/learn-vault-lab
    
  3. Make the directory.

    $ mkdir $HC_LEARN_LAB
    
  4. Change into the lab directory.

    $ cd $HC_LEARN_LAB
    

    You will perform all steps of the tutorial scenario from within this directory.

  5. Create directories for Vault configuration and data for the 2 clusters.

    $ mkdir -p cluster-a/config cluster-a/data cluster-b/config cluster-b/data
    
  6. Pull the latest Vault Enterprise Docker image.

    NOTE: You must log into Docker Hub before pulling the Vault Enterprise image.

    $ docker pull hashicorp/vault-enterprise:latest
    latest: Pulling from hashicorp/vault-enterprise
    ...snip...
    Status: Downloaded newer image for hashicorp/vault-enterprise:latest
    docker.io/hashicorp/vault-enterprise:latest
    
  7. Create a Docker network named learn-vault.

    $ docker network create learn-vault
    d6a8247e3f138344c4686a517834ec2e2af68be9d728afb08bcfe21aae616785
    

Start the cluster A container

Each cluster container uses a unique Vault server configuration file.

  1. Create the cluster A configuration file.

    $ cat > cluster-a/config/vault-server.hcl <<EOF
    ui = true
    listener "tcp" {
      tls_disable = 1
      address = "[::]:8200"
      cluster_address = "[::]:8201"
    }
    
    storage "raft" {
      path = "/vault/file"
    }
    EOF
    

    NOTE: Although the listener stanza disables TLS (tls_disable = 1) for this tutorial, Vault should always be used with TLS in production to enable secure communication between clients and the Vault server. This configuration requires a certificate file and key file on each Vault host.

  2. Start the cluster A container.

    $ docker run \
          --name=vault-enterprise-cluster-a \
          --hostname=cluster-a \
          --network=learn-vault \
          --publish 8200:8200 \
          --env VAULT_ADDR="http://localhost:8200" \
          --env VAULT_CLUSTER_ADDR="http://cluster-a:8201" \
          --env VAULT_API_ADDR="http://cluster-a:8200" \
          --env VAULT_RAFT_NODE_ID="cluster-a" \
          --env VAULT_LICENSE="$MY_VAULT_LICENSE" \
          --volume $PWD/cluster-a/config/:/vault/config \
          --volume $PWD/cluster-a/data/:/vault/file:z \
          --cap-add=IPC_LOCK \
          --detach \
          --rm \
          hashicorp/vault-enterprise vault server -config=/vault/config/vault-server.hcl
    
  3. Confirm that the cluster A container is up.

    $ docker ps -f name=vault-enterprise --format "table {{.Names}}\t{{.Status}}"
    

    Example expected output:

    NAMES                        STATUS
    vault-enterprise-cluster-a   Up 3 seconds
    
  4. Initialize the cluster A Vault, writing the initialization information including unseal key and initial root token to the file cluster-a/.init.

    NOTE: The initialization example here uses the Shamir's Secret Sharing based seal with 1 key share for convenience in the hands on lab. You should use more than one key share or an auto seal type in production.

    $ vault operator init \
        -address=http://127.0.0.1:8200 \
        -key-shares=1 \
        -key-threshold=1 \
        > $PWD/cluster-a/.init
    
  5. Export the environment variable CLUSTER_A_UNSEAL_KEY with the cluster A unseal key as its value.

    $ export CLUSTER_A_UNSEAL_KEY="$(grep 'Unseal Key 1' cluster-a/.init | awk '{print $NF}')"
    
  6. Export the environment variable CLUSTER_A_ROOT_TOKEN with the cluster A initial root token as its value.

    $ export CLUSTER_A_ROOT_TOKEN="$(grep 'Initial Root Token' cluster-a/.init | awk '{print $NF}')"
    
  7. Unseal Vault in cluster A.

    $ vault operator unseal -address=http://127.0.0.1:8200 $CLUSTER_A_UNSEAL_KEY
    

    Successful output example:

    Key                     Value
    ---                     -----
    Seal Type               shamir
    Initialized             true
    Sealed                  false
    Total Shares            1
    Threshold               1
    Version                 1.12.2+ent
    Build Date              2022-11-23T21:33:30Z
    Storage Type            raft
    Cluster Name            vault-cluster-5d1417f7
    Cluster ID              0aa8c2eb-be93-03b3-bc22-a0b349fd8938
    HA Enabled              true
    HA Cluster              n/a
    HA Mode                 standby
    Active Node Address     <none>
    Raft Committed Index    57
    Raft Applied Index      57
    

    Upon unsealing Vault, it returns a status with Sealed having a value of false. This means that Vault is now unsealed and ready for use in cluster A.

Start the cluster B container

Repeat a variation of the earlier workflow to start cluster B.

  1. Create the cluster B configuration file.

    $ cat > cluster-b/config/vault-server.hcl << EOF
    ui = true
    listener "tcp" {
      tls_disable = 1
      address = "[::]:8220"
      cluster_address = "[::]:8221"
    }
    
    storage "raft" {
      path = "/vault/file"
    }
    EOF
    

    Network ports: Cluster B uses a different and non-standard set of port numbers for the Vault API and cluster addresses than cluster A. This is for simplicity in communicating with each cluster from the Docker host.

  2. Start the cluster B container.

    $ docker run \
          --name=vault-enterprise-cluster-b \
          --hostname=cluster-b \
          --network=learn-vault \
          --publish 8220:8220 \
          --env VAULT_ADDR="http://localhost:8220" \
          --env VAULT_CLUSTER_ADDR="http://cluster-b:8221" \
          --env VAULT_API_ADDR="http://cluster-b:8220" \
          --env VAULT_RAFT_NODE_ID="cluster-b" \
          --env VAULT_LICENSE="$MY_VAULT_LICENSE" \
          --volume $PWD/cluster-b/config/:/vault/config \
          --volume $PWD/cluster-b/data/:/vault/file:z \
          --cap-add=IPC_LOCK \
          --detach \
          --rm \
          hashicorp/vault-enterprise vault server -config=/vault/config/vault-server.hcl
    
  3. Check the container status.

    $ docker ps -f name=vault-enterprise --format "table {{.Names}}\t{{.Status}}"
    NAMES                                STATUS
    vault-enterprise-secondary-cluster   Up 6 seconds
    vault-enterprise-primary-cluster     Up About a minute
    
  4. Initialize the cluster B Vault, writing the initialization information including unseal key and initial root token to the file secondary/.init.

    $ vault operator init \
        -address=http://127.0.0.1:8220 \
        -key-shares=1 \
        -key-threshold=1 \
        > cluster-b/.init
    
  5. Export the environment variable CLUSTER_B_UNSEAL_KEY with the cluster B unseal key as its value.

    $ export CLUSTER_B_UNSEAL_KEY="$(grep 'Unseal Key 1' cluster-b/.init | awk '{print $NF}')"
    
  6. Export the environment variable CLUSTER_B_ROOT_TOKEN with the cluster B initial root token as its value.

    $ export CLUSTER_B_ROOT_TOKEN="$(grep 'Initial Root Token' cluster-b/.init | awk '{print $NF}')"
    
  7. Unseal Vault in cluster B.

    $ vault operator unseal -address=http://127.0.0.1:8220 $CLUSTER_B_UNSEAL_KEY
    

    Successful output example:

    Key                     Value
    ---                     -----
    Seal Type               shamir
    Initialized             true
    Sealed                  false
    Total Shares            1
    Threshold               1
    Version                 1.12.1+ent
    Build Date              2022-10-28T12:10:32Z
    Storage Type            raft
    Cluster Name            vault-cluster-4ccfd107
    Cluster ID              0fc163cd-b3bf-1921-f740-c03f645065d2
    HA Enabled              true
    HA Cluster              n/a
    HA Mode                 standby
    Active Node Address     <none>
    Raft Committed Index    59
    Raft Applied Index      59
    

You are now prepared to configure DR replication between cluster A and cluster B using the Vault CLI, HTTP API, or UI.

Configure replication

The basic steps to configure DR replication are as follows:

DR Replication

  1. Enable DR primary replication on cluster A.
  2. Generate secondary token on cluster A.
  3. Enable DR secondary replication on cluster B.
  4. Confirm replication status on both clusters.

Enable replication on cluster A

  1. Export a VAULT_ADDR environment variable to communicate with the cluster A Vault.

    $ export VAULT_ADDR=http://127.0.0.1:8200
    
  2. Login with the initial root token.

    $ vault login -no-print $CLUSTER_A_ROOT_TOKEN
    
  3. Enable DR replication on cluster A.

    $ vault write -f sys/replication/dr/primary/enable
    WARNING! The following warnings were returned from Vault:
    
    * This cluster is being enabled as a primary for replication. Vault will be
    unavailable for a brief period and will resume service shortly.
    
  4. Generate a secondary token and assign its value to the exported environment variable DR_SECONDARY_TOKEN.

    $ export DR_SECONDARY_TOKEN="$(vault write -field wrapping_token \
        sys/replication/dr/primary/secondary-token id=cluster-b)"
    
  5. Confirm the DR_SECONDARY_TOKEN environment variable value.

    $ echo $DR_SECONDARY_TOKEN
    

    The output should resemble this example:

    c0ffeeciOiJFUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NvciI6IiIsImFkZHIiOiJodHRwOi8vcHJpbWFyeTo4MjAwIiwiZXhwIjoxNjYzMTcxMTQ1LCJpYXQiOjE2NjMxNjkzNDUsImp0aSI6Imh2cy4waElpSHlGN2MwaWIweG5nNjJlbFJSYXMiLCJuYmYiOjE2NjMxNjkzNDAsInR5cGUiOiJ3cmFwcGluZyJ9.AOgAK6_-V0rXnTNZid1M0BHQBhsdg_W2RcJTydY-v5NAOBUW6LIjFv00pYpjVXYuTXYolTOmcu0Vwja2l2FXNEBNABzsdzo-lfu0J9vudhgh98Z543YsZuDZ1Y4PBb2WbJIx0Qvtw1P5-DqutEAtl-oJejm9wsVVlzjcMgMdLJLBOF-6
    
  1. Enable DR replication on cluster A by invoking the /sys/replication/dr/primary/enable endpoint.

    Example:

    $ curl --silent --request POST \
    --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
    --data '{}' \
    http://127.0.0.1:8200/v1/sys/replication/dr/primary/enable \
    | jq
    

    Successful output example:

    {
    "request_id": "438038d2-0a0d-01b7-edbe-4e0cab1a2614",
    "lease_id": "",
    "renewable": false,
    "lease_duration": 0,
    "data": null,
    "wrap_info": null,
    "warnings": [
       "This cluster is being enabled as a primary for replication. Vault will be unavailable for a brief period and will resume service shortly."
    ],
    "auth": null
    }
    
  2. Generate a secondary token by invoking /sys/replication/dr/primary/secondary-token endpoint, and assigning its value to the exported environment variable DR_SECONDARY_TOKEN.

    Example:

    $ export DR_SECONDARY_TOKEN=$(curl \
        --silent \
        --request POST \
        --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
        --data '{"id": "cluster-b"}' \
        http://127.0.0.1:8200/v1/sys/replication/dr/primary/secondary-token \
        | jq -r '.wrap_info.token' | tr -d '"')
    
  3. Confirm the environment variable value.

    $ echo $DR_SECONDARY_TOKEN
    

    The output should resemble this example:

    c0ffeeciOiJFUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NvciI6IiIsImFkZHIiOiJodHRwOi8vcHJpbWFyeTo4MjAwIiwiZXhwIjoxNjYzMTcxMTQ1LCJpYXQiOjE2NjMxNjkzNDUsImp0aSI6Imh2cy4waElpSHlGN2MwaWIweG5nNjJlbFJSYXMiLCJuYmYiOjE2NjMxNjkzNDAsInR5cGUiOiJ3cmFwcGluZyJ9.AOgAK6_-V0rXnTNZid1M0BHQBhsdg_W2RcJTydY-v5NAOBUW6LIjFv00pYpjVXYuTXYolTOmcu0Vwja2l2FXNEBNABzsdzo-lfu0J9vudhgh98Z543YsZuDZ1Y4PBb2WbJIx0Qvtw1P5-DqutEAtl-oJejm9wsVVlzjcMgMdLJLBOF-6
    
  1. From your terminal session, copy the value of $CLUSTER_A_ROOT_TOKEN to your system clipboard. You can use a system-specific clipboard utility like pbcopy for this step.

    $ echo $CLUSTER_A_ROOT_TOKEN | pbcopy
    
  2. Open a web browser, and access the cluster A Vault UI at http://127.0.0.1:8200/ui.

  3. Sign into Vault by pasting the cluster A root token value into the Token field.

  4. Click the arrow next to Status and click Enable under REPLICATION. Enable in the status menu

  5. Select the Disaster Recovery (DR) radio button. Selecting 'disaster recovery' from the two options

  6. Click Enable Replication.

  7. In the Known Secondaries section, click Add secondary.

    Adding a secondary from the DR primary

  8. Populate the Secondary ID field with cluster-b, and click Generate token. Generating a secondary token

  9. Click Copy & Close to copy the token which you will need to enable the DR secondary cluster B. Consider keeping this value someplace other than the system clipboard so it's not overwritten before you need to use it later.

    Generated a secondary token

    NOTE: Paste the token value somewhere safe where you can retrieve it for a later step. DO NOT leave it in the system clipboard, as it will be overwritten by another copied value later.

Enable replication on cluster B

You must perform following operations on cluster B.

Now you can enable replication on cluster B. Vault will use the secondary token to automatically configure cluster B as a secondary to cluster A.

  1. Export a VAULT_ADDR environment variable to communicate with Vault in cluster B.

    $ export VAULT_ADDR=http://127.0.0.1:8220
    
  2. Log in with the cluster B initial root token.

    $ vault login -no-print $CLUSTER_B_ROOT_TOKEN
    
  3. Enable DR replication on the secondary cluster.

    Warning: This immediately clears all data in the secondary cluster.

$ vault write sys/replication/dr/secondary/enable token=$DR_SECONDARY_TOKEN

Expected output:

WARNING! The following warnings were returned from Vault:

* Vault has successfully found secondary information; it may take a while to
perform setup tasks. Vault will be unavailable until these tasks and initial
sync complete.

Create an API request payload containing the token obtained from cluster A.

$ tee payload.json << EOF
{
"token": "$DR_SECONDARY_TOKEN"
}
EOF

Enable DR replication on cluster B.

Warning: This command clears all data in the secondary cluster.

$ curl \
      --request POST http://127.0.0.1:8220/v1/sys/replication/dr/secondary/enable \
      --header "X-Vault-Token: $CLUSTER_B_ROOT_TOKEN" \
      --data @payload.json \
      | jq

Example output:

{
   "request_id": "92ec703e-ef94-ae4c-4e45-d7229798b224",
   "lease_id": "",
   "renewable": false,
   "lease_duration": 0,
   "data": null,
   "wrap_info": null,
   "warnings": [
      "Vault has successfully found secondary information; it may take a while to perform setup tasks. Vault will be unavailable until these tasks and initial sync complete."
   ],
   "auth": null
}
  1. From your terminal session, copy the value of $CLUSTER_B_ROOT_TOKEN to your system clipboard. You can use a system-specific clipboard utility like pbcopy for this step.

    $ echo $CLUSTER_B_ROOT_TOKEN | pbcopy
    
  2. In another browser tab, access the cluster B Vault UI at http://127.0.0.1:8220/ui.

  3. Sign into Vault by pasting in the cluster B root token value.

  4. Select the arrow next to Status and click Enable under REPLICATION.

    Enable in the status menu

  5. Check the Disaster Recovery (DR) radio button and select secondary under the Cluster mode..

  6. Paste the token you copied from the primary in the Secondary activation token field.

    Choosing 'Disaster Recovery' for a secondary mode

  7. Click Enable replication. (Warning: This immediately clears all data in the secondary cluster.)

    DR Secondary enabled

    Click the Details tab to see replication details.

    DR secondary enabled

Confirm replication status

Now that you have successfully enabled DR replication, you will enable a new secrets engine and create a secret on cluster A, then confirm replication status between the clusters.

Enable the KV version 2 secrets engine, write a secret, and verify the replication status.

  1. Export a VAULT_ADDR environment variable to communicate with the primary cluster Vault.

    $ export VAULT_ADDR=http://127.0.0.1:8200
    
  2. Login with the root cluster A root token.

    $ vault login -no-print $CLUSTER_A_ROOT_TOKEN
    
  3. Enable a Key/Value version 2 secrets engine at the path replicated-secrets.

    $ vault secrets enable -path=replicated-secrets -version=2 kv
    
  4. Put a test secret into the newly enabled secrets engine.

    $ vault kv put replicated-secrets/learn-failover failover=false secret=984UIFBH4HK3M84
    

    Successful example output:

    ============= Secret Path =============
    replicated-secrets/data/learn-failover
    
    ======= Metadata =======
    Key                Value
    ---                -----
    created_time       2022-09-13T18:44:09.734060046Z
    custom_metadata    <nil>
    deletion_time      n/a
    destroyed          false
    version            1
    
  5. Check the replication status on primary cluster.

    $ vault read sys/replication/dr/status
    Key                     Value
    ---                     -----
    cluster_id              d8f8a096-c55e-d13f-0274-faadb011b0b0
    known_secondaries       [cluster-b]
    last_dr_wal             51
    last_reindex_epoch      0
    last_wal                51
    merkle_root             842e9a56744da59fef266464a805432ca9fc4cd1
    mode                    primary
    primary_cluster_addr    n/a
    secondaries             [map[api_address:http://cluster-b:8220 cluster_address:https://cluster-b:8221 connection_status:connected last_heartbeat:2022-12-05T18:08:38Z node_id:cluster-b]]
    state                   running
    
  6. Check the replication status on cluster B.

    $ vault read -address=http://127.0.0.1:8220 sys/replication/dr/status
    Key                            Value
    ---                            -----
    cluster_id                     d8f8a096-c55e-d13f-0274-faadb011b0b0
    connection_state               ready
    known_primary_cluster_addrs    [https://cluster-a:8201]
    last_reindex_epoch             1670263708
    last_remote_wal                51
    merkle_root                    842e9a56744da59fef266464a805432ca9fc4cd1
    mode                           secondary
    primaries                      [map[api_address:http://cluster-a:8200 cluster_address:https://cluster-a:8201 connection_status:connected last_heartbeat:2022-12-05T18:10:13Z]]
    primary_cluster_addr           https://cluster-a:8201
    secondary_id                   cluster-b
    state
    

The replication state on cluster A is running and its mode is primary. On cluster B, the state is stream-wals and the mode is secondary. This detail in combination with matching last_wal and last_remote_wal values confirms that the secret you created replicated to the secondary, and that the clusters synced.

Enable the KV version 2 secrets engine, write a secret, and verify the replication status.

  1. Export a VAULT_ADDR environment variable to communicate with the cluster A Vault.

    $ export VAULT_ADDR=http://127.0.0.1:8200
    
    
  2. Enable a Key/Value version 2 secrets engine at the path replicated-secrets.

    $ curl \
     --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
     --request POST \
     --data '{"type": "kv","options": {"version": 2}}' \
     $VAULT_ADDR/v1/sys/mounts/replicated-secrets
    

    This command produces no output.

  3. Put a test secret into the newly enabled secrets engine.

    $ curl \
     --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
     --request POST \
     --data '{"options": {"cas": 0},"data": {"failover": false,"secret": "984UIFBH4HK3M84"}}' \
     $VAULT_ADDR/v1/replicated-secrets/data/learn-failover \
     | jq
    

    Successful example output:

    {
       "request_id": "21e3612d-6051-1636-dc26-69207a603016",
       "lease_id": "",
       "renewable": false,
       "lease_duration": 0,
       "data": {
          "created_time": "2022-11-02T16:07:37.014805589Z",
          "custom_metadata": null,
          "deletion_time": "",
          "destroyed": false,
          "version": 1
       },
       "wrap_info": null,
       "warnings": null,
       "auth": null
    }
    
  4. Check the replication status on primary cluster.

    $ curl --silent $VAULT_ADDR/v1/sys/replication/dr/status | jq
    

    Example output:

    {
       "request_id": "3f5c30f4-0a0c-cd48-6848-353db2a075ae",
       "lease_id": "",
       "renewable": false,
       "lease_duration": 0,
       "data": {
          "cluster_id": "8e1cc78b-ec73-eeef-ecd6-6ff5250c99bd",
          "known_secondaries": [
             "cluster-b"
          ],
          "last_dr_wal": 51,
          "last_reindex_epoch": "0",
          "last_wal": 51,
          "merkle_root": "4181c3fa3f56aa096698d8f639d5494adf4af1e1",
          "mode": "primary",
          "primary_cluster_addr": "",
          "secondaries": [
             {
             "api_address": "http://secondary:8220",
             "cluster_address": "https://secondary:8221",
             "connection_status": "connected",
             "last_heartbeat": "2022-11-02T17:26:08Z",
             "node_id": "cluster-b"
             }
          ],
          "state": "running"
       },
       "wrap_info": null,
       "warnings": null,
       "auth": null
    }
    
  5. update the VAULT_ADDR environment variable to override the Vault server address to that of cluster B.

    $ export VAULT_ADDR=http://127.0.0.1:8220
    
  6. Check the replication status on secondary cluster B.

    $ curl --silent $VAULT_ADDR/v1/sys/replication/dr/status | jq
    
    {
    "request_id": "fd93856f-0a27-7719-9d58-96f24efb4afe",
    "lease_id": "",
    "renewable": false,
    "lease_duration": 0,
    "data": {
       "cluster_id": "8e1cc78b-ec73-eeef-ecd6-6ff5250c99bd",
       "connection_state": "ready",
       "known_primary_cluster_addrs": [
          "https://cluster-a:8201"
       ],
       "last_reindex_epoch": "1667404572",
       "last_remote_wal": 51,
       "merkle_root": "4181c3fa3f56aa096698d8f639d5494adf4af1e1",
       "mode": "secondary",
       "primaries": [
          {
          "api_address": "http://primary:8200",
          "cluster_address": "https://cluster-a:8201",
          "connection_status": "connected",
          "last_heartbeat": "2022-11-02T17:28:48Z"
          }
       ],
       "primary_cluster_addr": "https://cluster-a:8201",
       "secondary_id": "cluster-b",
       "state": "stream-wals"
    },
    "wrap_info": null,
    "warnings": null,
    "auth": null
    }
    

The replication state on primary is running and its mode is primary. On secondary, the state is stream-wals and the mode is secondary. This detail in combination with the matching last_wal and last_remote_wal values shows that the secret you created replicated to the secondary, and both clusters are in sync.

Enable the KV version 2 secrets engine, write a secret, and verify the replication status.

  1. Access the browser tab for cluster A.

  2. Click Secrets.

  3. Click Enable new engine.

    Enable new secrets engine

  4. Select KV and click *Next.

    Enable KV secrets engine

  5. In the Path field, enter replicated-secrets.

    Enable KV secrets engine details

  6. Click Enable Engine.

  7. Click Create secret.

    Create secret

  8. In the Path for this secret field, enter learn-failover.

  9. In the Secret data fields, enter failover into the key field and false into the value field.

  10. Click Add.

  11. In the second set of Secret data fields, enter secret into the key field and 984UIFBH4HK3M84 into the value field.

    Secret fields

  12. Click Save.

  13. Check the replication status in the cluster A browser tab by clicking Status and Disaster Recovery Primary.

    Replication status cluster A

  14. Check the replication status in the cluster B browser tab by clicking Status and Disaster Recovery Secondary.

    Replication status cluster B

The replication state on cluster A is running and its mode is primary. On cluster B, the state is stream-wals and the mode is secondary. This detail in combination with matching last_wal and last_remote_wal values confirms that the secret you created replicated to the secondary, and that the clusters synced.

TIP: You can learn more about replication monitoring in the Monitoring Vault Replication tutorial.

You are now ready to continue with the failover and failback scenario.

Failover scenario

The goal of this section is to failover the current primary cluster A, and then promote the current secondary cluster B to become the new primary cluster.

You will also validate access to your secret data from the newly promoted primary, and update cluster A, setting cluster B as its new primary.

Take a snapshot

Before proceeding with any failover or failback, it's critical that you have a recent backup of the Vault data. Since the scenario environment uses Vault servers with Integrated Storage, you can take a snapshot of the cluster A Vault data, and write it to cluster-a/vault-cluster-a-snapshot.snap as a backup.

  1. Export a VAULT_ADDR environment variable to communicate with the cluster A Vault.

    $ export VAULT_ADDR=http://127.0.0.1:8200
    
  2. Take a snapshot of the cluster A data, and write it to cluster-a/vault-cluster-a-snapshot.snap.

    $ vault operator raft snapshot save cluster-a/vault-cluster-a-snapshot.snap
    

    This command produces no output.

  3. Confirm that the snapshot file is present in the cluster-a directory:

    $ ls -lh cluster-a/vault-cluster-a-snapshot.snap
    Permissions Size User  Date Modified Name
    .rw-r--r--   97k you   15 Nov 09:58  cluster-a/vault-cluster-a-snapshot.snap
    
  1. Export a VAULT_ADDR environment variable to communicate with cluster A Vault.

    $ export VAULT_ADDR=http://127.0.0.1:8200
    
  2. Take a snapshot of the cluster A data and write it to cluster-a/vault-cluster-a-snapshot.snap.

    $ curl \
       --silent \
       --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
       --request GET \
       http://127.0.0.1:8200/v1/sys/storage/raft/snapshot > cluster-a/vault-cluster-a-snapshot.snap
    

    This command produces no output.

  3. Confirm that the snapshot file is present in the cluster-a directory:

    $ ls -lh cluster-a/vault-cluster-a-snapshot.snap
    Permissions Size User  Date Modified Name
    .rw-r--r--   97k you   15 Nov 09:58  cluster-a/vault-cluster-a-snapshot.snap
    

After confirming replication status and taking a snapshot of Vault data, you are ready to begin the failover workflow.

Batch disaster recovery operation token strategy

To promote a DR secondary cluster to be the new primary, a DR operation token is typically needed. However, the process of generating a DR operation token requires a threshold of unseal keys or recovery keys if Vault uses auto unseal. This can be troublesome since a cluster failure is usually caused by unexpected incident. You find difficulty in coordinating amongst the key holders to generate the DR operation token in a timely fashion.

As of Vault 1.4, you can create a batch DR operation token that you can use to promote and demote clusters as needed. This is a strategic operation that the Vault administrator can use to prepare for loss of the DR primary ahead of time. The batch DR operation token also has the advantage of being usable from the primary or secondary more than once.

Vault version: The following steps require Vault 1.4 or later. If you are running an earlier version of Vault, follow the DR operation token generation steps in the Promote DR Secondary to Primary section.

  1. Export a VAULT_ADDR environment variable to communicate with the cluster A Vault.

    $ export VAULT_ADDR=http://127.0.0.1:8200
    
  2. Create a policy named "dr-secondary-promotion" on cluster A allowing the update capability for the sys/replication/dr/secondary/promote path. In addition, you can add a policy for the sys/replication/dr/secondary/update-primary path so that you can use the same DR operation token to update the primary cluster that the secondary cluster points to.

    $ VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN vault policy write \
        dr-secondary-promotion - <<EOF
    path "sys/replication/dr/secondary/promote" {
      capabilities = [ "update" ]
    }
    
    # To update the primary to connect
    path "sys/replication/dr/secondary/update-primary" {
        capabilities = [ "update" ]
    }
    
    # Only if using integrated storage (raft) as the storage backend
    # To read the current autopilot status
    path "sys/storage/raft/autopilot/state" {
        capabilities = [ "update" , "read" ]
    }
    EOF
    

    Successful example output:

    Success! Uploaded policy: dr-secondary-promotion
    

    NOTE: The policy on the sys/storage/raft/autopilot/state path is only required if your cluster uses Integrated Storage as its persistence layer. Refer to the Integrated Storage Autopilot tutorial to learn more about Autopilot.

  3. Verify that you enabled the "dr-secondary-promotion" policy.

    $ VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN vault policy list
    
     default
     dr-secondary-promotion
     root
    
  4. Create a token role named "failover-handler" with the dr-secondary-promotion policy attached and its type should be batch. You can't renew a batch token, so set the renewable parameter value to false. Also, set the orphan parameter to true.

    $ VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN vault write auth/token/roles/failover-handler \
        allowed_policies=dr-secondary-promotion \
        orphan=true \
        renewable=false \
        token_type=batch
    
  5. Create a token for role, "failover-handler" with time-to-live (TTL) set to 8 hours.

    $ VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN vault token create \
       -role=failover-handler -ttl=8h
    

    Successful example output:

    Key                  Value
    ---                  -----
    token                hvb.AAAAAQJElHcwQOSpT6KSHtgZQWvBeU_Kki7py77MZT5Sv-LKpISO47Sgrd7kUBnggKJwM66GwjaT0fWx2oaEfyLz7Sg2X_xRpZ52Jn6tBhz6Al5C-MBIFY-p2jbH6xhIdgdszRzGHaMuKuVOb5ACswZ6enNqoDLB81CuEKalACCN-fwlT4fOohHWIFxg4fgIGcFGc0ff33
    token_accessor       n/a
    token_duration       8h
    token_renewable      false
    token_policies       ["default" "dr-secondary-promotion"]
    identity_policies    []
    policies             ["default" "dr-secondary-promotion"]
    
  6. Export a token as the value of the CLUSTER_B_DR_OP_TOKEN environment variable.

    $ export CLUSTER_B_DR_OP_TOKEN=$(VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN vault token create \
        -field=token -role=failover-handler -ttl=8h)
    
  1. Export a VAULT_ADDR environment variable to communicate with the cluster A Vault.

    $ export VAULT_ADDR=http://127.0.0.1:8200
    
  2. Create a policy named "dr-secondary-promotion" on cluster A allowing the update capability on the sys/replication/dr/secondary/promote path.

    First, create a json API payload with the path and capabilities using a HCL policy string.

    $ tee policy-payload.json <<EOF
    {
     "policy": "path \"sys/replication/dr/secondary/promote\" {\n  capabilities = [ \"update\" ]\n}\n\n# To update the primary to connect\npath \"sys/replication/dr/secondary/update-primary\" {\n    capabilities = [ \"update\" ]\n}\n\n# Only if using integrated storage (raft) as the storage backend\n# To read the current autopilot status\npath \"sys/storage/raft/autopilot/state\" {\n    capabilities = [ \"update\" , \"read\" ]\n}\n"
    }
    EOF
    

    NOTE: The policy on the sys/storage/raft/autopilot/state path is only required if your cluster uses Integrated Storage as its persistence layer. Refer to the Integrated Storage Autopilot tutorial to learn more about Autopilot.

    Create a new policy named dr-secondary-promotion.

    $ curl \
        --silent \
        --request PUT \
        --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
        --data @policy-payload.json \
        $VAULT_ADDR/v1/sys/policies/acl/dr-secondary-promotion
    

    This command produces no output.

  3. Create a token role named "failover-handler" with the dr-secondary-promotion policy attached and its type should be batch. You can't renew a batch token, so set the renewable parameter value to false. Also, set the orphan parameter to true.

    First, create a json API payload for the allowed dr-secondary-promotion policy.

    $ tee role-payload.json <<EOF
    {
     "allowed_policies": [ "dr-secondary-promotion" ],
     "orphan": true,
     "renewable": false,
     "token_type": "batch"
    }
    EOF
    

    Make the API request using the role-payload.json payload that you created earlier.

    $ curl \
        --silent \
        --request PUT \
        --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
        --data @role-payload.json \
        $VAULT_ADDR/v1/auth/token/roles/failover-handler
    

    This command produces no output.

  4. Now, create a token for role, "failover-handler".

    $ curl \
        --silent \
        --request POST \
        --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
        $VAULT_ADDR/v1/auth/token/create/failover-handler | jq ".auth"
    

    Example expected output:

    {
    "client_token": "hvb.AAAAAQIuUF0EGfdO9LbpUUMheMThYt5E5dAUd_ZZqCHFSGzwO7HpB8ZJbLSEOZUdA39dVgSW0cZUHoBnG6ZRfFXgyW_1Xkj_rw-KdnTf5oxnhj5jAleyLGeYxKHecKgv4Lx5OQ4Qcb_zrJc_6RLhnEcD4oTupjsVgooFJnD41BzOtY7S5AuhiR9ognr6gSOFD3KC5L-IaPw",
    "accessor": "",
    "policies": [
       "default",
       "dr-secondary-promotion"
    ],
    "token_policies": [
       "default",
       "dr-secondary-promotion"
    ],
    "metadata": null,
    "lease_duration": 2764800,
    "renewable": false,
    "entity_id": "",
    "token_type": "batch",
    "orphan": true,
    "mfa_requirement": null,
    "num_uses": 0
    }
    
  5. Export the token as the value of the CLUSTER_B_DR_OP_TOKEN environment variable.

    $ export CLUSTER_B_DR_OP_TOKEN=$(curl \
        --silent \
        --request POST \
        --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
        $VAULT_ADDR/v1/auth/token/create/failover-handler | jq -r ".auth.client_token")
    
  1. Access the browser tab for cluster A.

  2. Click the Policies tab, and then select Create ACL policy.

  3. Enter dr-secondary-promotion in the Name text field, and then enter the following policy in the Policy text field.

    path "sys/replication/dr/secondary/promote" {
      capabilities = [ "update" ]
    }
    
    # To update the primary to connect
    path "sys/replication/dr/secondary/update-primary" {
        capabilities = [ "update" ]
    }
    
    # Only if using integrated storage (raft) as the storage backend
    # To read the current autopilot status
    path "sys/storage/raft/autopilot/state" {
        capabilities = [ "update" , "read" ]
    }
    

    NOTE: The policy on the sys/storage/raft/autopilot/state path is only required if your cluster uses Integrated Storage as its persistence layer. Refer to the Integrated Storage Autopilot tutorial to learn more about Autopilot.

  4. Click Create Policy to complete.

  5. Click the Vault CLI shell icon (>_) to open a command shell.

    Command Shell

  6. Execute the following command to create a token role named "failover-handler" with the dr-secondary-promotion policy attached.

    vault write auth/token/roles/failover-handler \
        allowed_policies=dr-secondary-promotion \
        orphan=true \
        renewable=false \
        token_type=batch
    
  7. Now, create a token from the role, "failover-handler" by executing the following command in the Vault CLI shell.

    vault write -force auth/token/create/failover-handler
    
  8. Export the token as the value of the CLUSTER_B_DR_OP_TOKEN environment variable.

    $ export CLUSTER_B_DR_OP_TOKEN=hvb.AAAAAQL2eOPXnXenNthXIFeuf4lxKH_VbFCeTja1YwCbv7iDctPQh71tbkqj3C_bJ0XVfXEGrP2kHo0n4oJVjEci7WS4qolokt5v6_b6NCpngztPGyBMdCgZyK_FBDRrxefGzyiEemHnnfVokUEwyf75lPNpKMkxKMuqbQ1P-yPcVtMS-yERmXYUNWhayn5rN3jk2-jXCvE
    

Securely store this batch token. If you need to promote the DR secondary cluster, you can use the batch DR operation token to perform the promotion. The batch token works on both primary and secondary clusters.

This eliminates the need for the unseal keys (or recovery keys if using auto unseal).

NOTE: Batch tokens have a fixed TTL and the Vault server automatically deletes them after they expire. You can use this in such a way that a Vault operator generates a batch DR operation token with TTL equals the duration of their shift.

Generate a disaster recovery operation token

If you are on a version of Vault before 1.4.0, you need to create a DR operation token to perform this task.

The following process is similar to Generating a Root Token (via CLI). You must share a number of unseal keys (or recovery keys for auto unseal) equal to the threshold value. Vault generated the unseal and recovery keys when you initialized cluster A.

NOTE: If you have a DR operation batch token, you can skip the DR operation token generation and proceed to the Promote cluster B to primary status section.

Perform this operation on the DR secondary cluster (Cluster B).

  1. Start the DR operation token generation process.

    $ vault operator generate-root -dr-token -init
    

    The generated output would look like:

    A One-Time-Password has been generated for you and is shown in the OTP field.
    You will need this value to decode the resulting root token, so keep it safe.
    Nonce         b4738404-0a11-63aa-2cb6-e77dfd96946f
    Started       true
    Progress      0/3
    Complete      false
    OTP           EYHAkPQYvvz93e8iI3pg1maQ
    OTP Length    24
    

    Distribute the generated nonce to each unseal key holder.

  2. Each unseal key holder should execute the following operation with their key share to generate a DR operation token.

    Example:

    $ vault operator generate-root -dr-token \
        -nonce=b4738404-0a11-63aa-2cb6-e77dfd96946f \
        UNSEAL_KEY_OF_ORIGINAL_DR_PRIMARY_1
    
    Nonce            b4738404-0a11-63aa-2cb6-e77dfd96946f
    Started          true
    Progress         1/3
    Complete         false
    
  3. Once you reach the threshold, the output displays an encoded DR operation token.

    Example:

    $ vault operator generate-root -dr-token \
        -nonce=b4738404-0a11-63aa-2cb6-e77dfd96946f \
        UNSEAL_KEY_OF_ORIGINAL_DR_PRIMARY_3
    
    Nonce            b4738404-0a11-63aa-2cb6-e77dfd96946f
    Started          true
    Progress         3/3
    Complete         true
    Encoded Token    djw4BR1iaDUFIBxaAwpiCC1YGhQHHDMf
    
  4. Decode the generated DR operation token (Encoded Token).

    Example:

    $ vault operator generate-root -dr-token \
         -decode="djw4BR1iaDUFIBxaAwpiCC1YGhQHHDMf" \
         -otp="EYHAkPQYvvz93e8iI3pg1maQ"
    
    hvs.5xsAyncmt1OPEHhMFPMKcYAG
    
  5. Export the token as the value of the CLUSTER_B_DR_OP_TOKEN environment variable.

    $ export CLUSTER_B_DR_OP_TOKEN=hvs.5xsAyncmt1OPEHhMFPMKcYAG
    
  1. In the Manage tab of your DR secondary (Cluster B), click on Generate token. Manage tab, with generate token box highlighted

  2. In the resulting modal, notice the option to encrypt your token with a PGP key. For this tutorial, select Generate operation token. Operation token modal

  3. You must share a number of unseal keys equal to the threshold value to create a new operation token for the DR secondary. Operation token OTP

    Each unseal-key holder must perform this operation.

  4. Once you reach the key share threshold, the output displays an encoded DR operation token. If the OTP is still available, the output includes it, but if it's not shown, you need to retrieve the value from the earlier step. Click the Copy icon to copy the DR Operation Token Command. Encoded operation token

  5. Execute the copied CLI command from a terminal to generate a DR operation token. Make sure to include the OTP from the earlier step.

    Example:

    $ vault operator generate-root -dr-token \
         -otp="I4BbXfN0F2biXY53bXx4bKPwU0" \
         -decode="OhobGjUifglzc1oPEwtyfSUWEUAHAT4yPHU"
    
    hvs.5Jw2qwxzwnYrgswSoNLoYqQC
    
  6. Export the token as the value of the CLUSTER_B_DR_OP_TOKEN environment variable.

    $ export CLUSTER_B_DR_OP_TOKEN=hvs.5Jw2qwxzwnYrgswSoNLoYqQC
    

Promote cluster B to primary status

The first step in this failover workflow is to promote cluster B as a primary.

While you can demote cluster A before promoting cluster B, in production DR scenarios you might instead promote cluster B before demoting cluster A due to unavailability of cluster A.

Important: For a brief time (between promotion of cluster B and demotion of cluster A) both clusters will be primary. You must redirect all traffic to cluster B once you promote it to primary. If there's a load balancer configured to route traffic to the cluster, you should change its rules to re-route traffic to the correct cluster. Consider also that you should to update DNS entries for the cluster servers as needed during this phase as well.

Promote cluster B to primary using the batch DR operation token.

$ VAULT_ADDR=http://127.0.0.1:8220 \
    VAULT_TOKEN=$CLUSTER_B_ROOT_TOKEN \
    vault write -f sys/replication/dr/secondary/promote \
    dr_operation_token=$CLUSTER_B_DR_OP_TOKEN

Successful example output:

WARNING! The following warnings were returned from Vault:

  * This cluster is being promoted to a replication primary. Vault will be
  unavailable for a brief period and will resume service shortly.

Export a VAULT_ADDR environment variable to address cluster B.

$ export VAULT_ADDR=http://127.0.0.1:8220

Promote cluster B to primary with the batch DR operation token.

$ curl \
    --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
    --request POST \
    --data "{\"dr_operation_token\": \"$CLUSTER_B_DR_OP_TOKEN\"}" \
    $VAULT_ADDR/v1/sys/replication/dr/secondary/promote \
    | jq

Example expected output:

{
    "request_id": "2ab31e77-235b-4738-c51a-f7e7b79fcfbb",
    "lease_id": "",
    "renewable": false,
    "lease_duration": 0,
    "data": null,
    "wrap_info": null,
    "warnings": [
        "This cluster is being promoted to a replication primary. Vault will be unavailable for a brief period and will resume service shortly."
    ],
    "auth": null
}

Promote cluster B to primary using the batch DR operation token.

  1. Access the browser tab for cluster B.

  2. Click Status.

  3. Click Disaster Recovery Secondary.

  4. Click Manage.

    Manage DR replication

  5. Click Promote cluster.

    Promote cluster

  6. Paste the batch DR operation token value that you generated earlier into the DR Operation Token field.

  7. Click Promote.

    Promote cluster B

Demote cluster A to secondary status

Demote cluster A so that it's no longer the primary cluster.

Export a VAULT_ADDR environment variable to address cluster A.

$ export VAULT_ADDR=http://127.0.0.1:8200

Demote cluster A.

$ vault write -f sys/replication/dr/primary/demote

Successful example output:

WARNING! The following warnings were returned from Vault:

  * This cluster is being demoted to a replication secondary. Vault will be
  unavailable for a brief period and will resume service shortly.

Export a VAULT_ADDR environment variable to address cluster A.

$ export VAULT_ADDR=http://127.0.0.1:8200

Demote cluster A.

$ curl \
    --silent \
    --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
    --request POST \
    $VAULT_ADDR/v1/sys/replication/dr/primary/demote \
    | jq

Example expected output:

{
  "request_id": "59431bff-3228-40df-431a-9c70aec90b79",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": null,
  "wrap_info": null,
  "warnings": [
    "This cluster is being demoted to a replication secondary. Vault will be unavailable for a brief period and will resume service shortly."
  ],
  "auth": null
}

Demote cluster A to secondary using the batch DR operation token.

  1. Access the browser tab for cluster A.

  2. Click Status.

  3. Click Disaster Recovery Primary.

  4. Click Manage.

    Manage DR replication

  5. Click Promote cluster.

    Promote cluster

  6. Confirm demotion by entering Disaster Recovery into the field.

  7. Click Confirm.

    Promote cluster B

Test access to Vault data

Now that cluster B is the primary, you can use the initial root token from cluster A to check that the Vault data is available the new primary cluster.

  1. Export a VAULT_ADDR environment variable to address cluster B.

    $ export VAULT_ADDR=http://127.0.0.1:8220
    
  2. Check for the failover secret in replicated-secrets using the cluster A initial root token.

    $ VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN vault kv get replicated-secrets/learn-failover
    

    Successful example output:

    ============= Secret Path =============
    replicated-secrets/data/learn-failover
    
    ======= Metadata =======
    Key                Value
    ---                -----
    created_time       2022-09-20T19:15:39.772945069Z
    custom_metadata    <nil>
    deletion_time      n/a
    destroyed          false
    version            1
    
    ====== Data ======
    Key         Value
    ---         -----
    failover    false
    

    The secret is present in your newly promoted primary cluster.

  3. Create an updated version of the secret, and set the value of key failover to true.

    $ VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN \
       vault kv put replicated-secrets/learn-failover \
       failover=true
    

    Successful example output:

    ============= Secret Path =============
    replicated-secrets/data/learn-failover
    
    ======= Metadata =======
    Key                Value
    ---                -----
    created_time       2022-09-20T19:28:26.971643793Z
    custom_metadata    <nil>
    deletion_time      n/a
    destroyed          false
    version            2
    
  1. Export a VAULT_ADDR environment variable to address cluster B.

    $ export VAULT_ADDR=http://127.0.0.1:8220
    
  2. Check for the failover secret in replicated-secrets by using the cluster A initial root token.

    $ curl \
     --silent \
     --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
     $VAULT_ADDR/v1/replicated-secrets/data/learn-failover \
     | jq
    

    Successful example output:

    {
    "request_id": "61e3f3cc-3599-cc49-ed3c-4224484b9e0d",
    "lease_id": "",
    "renewable": false,
    "lease_duration": 0,
    "data": {
       "data": {
          "failover": false,
          "secret": "984UIFBH4HK3M84"
       },
       "metadata": {
          "created_time": "2022-11-29T15:41:34.282588313Z",
          "custom_metadata": null,
          "deletion_time": "",
          "destroyed": false,
          "version": 1
       }
    },
    "wrap_info": null,
    "warnings": null,
    "auth": null
    }
    

    The secret is present in your newly promoted primary cluster.

  3. Create an updated version of the secret, and set the value of key failover to true.

    $ curl \
     --silent \
     --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
     --request POST \
     --data '{"data": {"failover": "true"}}' \
     $VAULT_ADDR/v1/replicated-secrets/data/learn-failover \
     | jq
    

    Successful example output:

    {
    "request_id": "fe24aee3-c795-4af0-f20e-0aced1c92222",
    "lease_id": "",
    "renewable": false,
    "lease_duration": 0,
    "data": {
       "created_time": "2022-11-29T15:58:20.591809238Z",
       "custom_metadata": null,
       "deletion_time": "",
       "destroyed": false,
       "version": 2
    },
    "wrap_info": null,
    "warnings": null,
    "auth": null
    }
    
  1. Access the browser tab for cluster B.

  2. You should have noticed that the UI emitted a "Not authorized" error. This is because your token was invalidated when promoting cluster B to primary.

  3. Click the user icon and Sign out to access the Sign in to Vault dialog.

    Sign out of UI

  4. From your terminal session, copy the value of $CLUSTER_A_ROOT_TOKEN to your system clipboard. You can use a system-specific clipboard utility like pbcopy for this step.

    $ echo $CLUSTER_A_ROOT_TOKEN | pbcopy
    
  5. Access the browser tab for cluster B again.

  6. Sign in by pasting the cluster A root token value into the Token field.

  7. Click Secrets.

  8. Click replicated-secrets/

    replicated-secrets

  9. Click learn-failover.

    Secret at version one

The secret is still at version 1.

Create an updated version of the secret, and set the value of key failover to true.

  1. Click Create new version.

    Create new secret version

  2. Enter true in the field next to the failover field. You can click the eye icon to reveal the value.

    Updated secret value

  3. Click Save to create a new version 2 of the secret.

    Updated secret value

You have created version 2 of the secret while cluster B is acting as the primary cluster.

Point demoted cluster A to new primary cluster B

Now that you have verified access to cluster A, update it to be a secondary in DR replication to cluster B.

  1. Export a VAULT_ADDR environment variable to address cluster B.
$ export VAULT_ADDR=http://127.0.0.1:8220
  1. Generate a new secondary token and assign its value to the exported environment variable CLUSTER_A_DR_SECONDARY_TOKEN.

    $ export CLUSTER_A_DR_SECONDARY_TOKEN="$(VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN \
        vault write -field wrapping_token \
        sys/replication/dr/primary/secondary-token id=clusterA)"
    
  2. Confirm the environment variable value.

    $ echo $CLUSTER_A_DR_SECONDARY_TOKEN
    

    Successful output example:

    eyJhbGciOiJFUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NvciI6IiIsImFkZHIiOiJodHRwOi8vc2Vjb25kYXJ5OjgyMjAiLCJleHAiOjE2NjQyMDU5NDMsImlhdCI6MTY2NDIwNDE0MywianRpIjoiaHZzLkFVTWN3V1NkY3E5c1U4dVBDd2hEc3hTdSIsIm5iZiI6MTY2NDIwNDEzOCwidHlwZSI6IndyYXBwaW5nIn0.AXmK5r1WU_Dvz5oV8NFqc0ffeepByFzh0eL9kHIlNwaT2ZSf5TAQ5qmByRCr5zTaV21jkU13tgpjLzFBxjXBXBt3ACKIsiUQZy06YIFpV8qDdvoAPEc6IfypizlDsOQVBMOTn1eIZitLaR95_QQ3_LzaEC_o7jg3brRzJcFMQnc0ffee
    
  3. Export VAULT_ADDR environment variable to address cluster A.

    $ export VAULT_ADDR=http://127.0.0.1:8200
    
  4. Point cluster B to cluster A, so that cluster A becomes a secondary cluster of (the new primary) cluster B. Use the batch operation token value or DR operation token value with the secondary token value to do so.

    $ vault write sys/replication/dr/secondary/update-primary \
        dr_operation_token=$CLUSTER_B_DR_OP_TOKEN token=$CLUSTER_A_DR_SECONDARY_TOKEN
    
    WARNING! The following warnings were returned from Vault:
    
    * Vault has successfully found secondary information; it may take a while to
    perform setup tasks. Vault will be unavailable until these tasks and initial
    sync complete.
    
  5. Check replication status on cluster B using JSON output for a bit more readability.

    $ VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN \
        vault read --format=json sys/replication/status
    

    Successful output example:

    {
    "request_id": "8db7d9d2-c624-27f3-ed68-650c772b1adf",
    "lease_id": "",
    "lease_duration": 0,
    "renewable": false,
    "data": {
       "dr": {
          "cluster_id": "d8f8a096-c55e-d13f-0274-faadb011b0b0",
          "connection_state": "ready",
          "known_primary_cluster_addrs": [
          "https://cluster-b:8221"
          ],
          "last_reindex_epoch": "0",
          "last_remote_wal": 0,
          "merkle_root": "6c73ab55554f73868b504d7cae470b5dd82f3833",
          "mode": "secondary",
          "primaries": [
          {
             "api_address": "http://cluster-b:8220",
             "cluster_address": "https://cluster-b:8221",
             "connection_status": "connected",
             "last_heartbeat": "2022-12-05T18:22:29Z"
          }
          ],
          "primary_cluster_addr": "https://cluster-b:8221",
          "secondary_id": "clusterA",
          "state": "stream-wals"
       },
       "performance": {
          "mode": "disabled"
       }
    },
    "warnings": null
    }
    

    Cluster A is now in mode secondary, and shows that it has a primary at the value of primary_cluster_addr of https://secondary (cluster B) as expected.

  6. Read the replicated-secrets/learn-failover secret with the cluster A initial root token.

    $ VAULT_ADDR=http://127.0.0.1:8220 \
      VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN \
      vault kv get replicated-secrets/learn-failover
    

    Successful example output:

    ============= Secret Path =============
    replicated-secrets/data/learn-failover
    
    ======= Metadata =======
    Key                Value
    ---                -----
    created_time       2022-12-05T18:22:10.447701139Z
    custom_metadata    <nil>
    deletion_time      n/a
    destroyed          false
    version            2
    
    ====== Data ======
    Key         Value
    ---         -----
    failover    true
    

Vault returns the expected secret value, and cluster A is now a secondary cluster to cluster B.

  1. Generate a new secondary token and assign its value to the exported environment variable CLUSTER_A_DR_SECONDARY_TOKEN.

    $ export CLUSTER_A_DR_SECONDARY_TOKEN=$(curl \
        --silent \
        --request POST \
        --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
        --data '{"id": "cluster-b"}' \
        http://127.0.0.1:8220/v1/sys/replication/dr/primary/secondary-token \
        | jq -r '.wrap_info.token' | tr -d '"')
    
  2. Confirm the environment variable value.

    $ echo $CLUSTER_A_DR_SECONDARY_TOKEN
    

    Successful output example:

    eyJhbGciOiJFUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NvciI6IiIsImFkZHIiOiJodHRwOi8vY2x1c3Rlci1iOjgyMjAiLCJleHAiOjE2NzAzNTY0NjEsImlhdCI6MTY3MDM1NDY2MSwianRpIjoiaHZzLmU3NnRjTGdUWEU5SW01RWhTUmJZNkhhac0ff33iZiI6MTY3MDM1NDY1NiwidHlwZSI6IndyYXBwaW5nIn0.AdkYod98oOgE0pvGnL_YMAhqd1LEx-fIVYzvH9HvIVV2cO35J3XBCeoD84gyP-CHZhh5_sdNYh8ITcWdUCz_jNYxAJIoPYXWqA-oElrczuTkksM0QuB9-95q93Zph5xtOp8NZUSybo9VyMQdzx5DzybvZCOPXPobpdnUQJp9W9c0ff33
    
  3. Point cluster B to cluster A, so that cluster A becomes a secondary cluster of (the new primary) cluster B. Use the batch operation token value or DR operation token value with the secondary token value to do so.

    $ curl \
     --header "X-Vault-Token: ..." \
     --request POST \
     --data "{\"dr_operation_token\":\"$CLUSTER_B_DR_OP_TOKEN\",\"token\":\"$CLUSTER_A_DR_SECONDARY_TOKEN\"}" \
     http://127.0.0.1:8200/v1/sys/replication/dr/secondary/update-primary \
     | jq
    
    {
    "request_id": "296c2b3d-44be-4ce8-939d-c624ad7bf61d",
    "lease_id": "",
    "renewable": false,
    "lease_duration": 0,
    "data": null,
    "wrap_info": null,
    "warnings": [
       "Vault has successfully found secondary information; it may take a while to perform setup tasks. Vault will be unavailable until these tasks and initial sync complete."
    ],
    "auth": null
    }
    
  4. Check replication status on cluster A.

    $ curl --silent http://127.0.0.1:8200/v1/sys/replication/dr/status | jq
    

    Successful output example:

    {
    "request_id": "2cefc9fa-d350-64ca-5446-69b4426cef8f",
    "lease_id": "",
    "renewable": false,
    "lease_duration": 0,
    "data": {
       "cluster_id": "b2829aba-bbab-216a-0e4d-e568efc121c6",
       "connection_state": "ready",
       "known_primary_cluster_addrs": [
          "https://cluster-b:8221"
       ],
       "last_reindex_epoch": "0",
       "last_remote_wal": 0,
       "merkle_root": "e5933d66e7c3a37828710f5c6fa392dec6b5f040",
       "mode": "secondary",
       "primaries": [
          {
          "api_address": "http://cluster-b:8220",
          "cluster_address": "https://cluster-b:8221",
          "connection_status": "connected",
          "last_heartbeat": "2022-12-06T19:35:25Z"
          }
       ],
       "primary_cluster_addr": "https://cluster-b:8221",
       "secondary_id": "cluster-b",
       "state": "stream-wals"
    },
    "wrap_info": null,
    "warnings": null,
    "auth": null
    }    
    

    Cluster A is now in mode secondary, and shows that it has a primary at the value of primary_cluster_addr of https://secondary (cluster B) as expected.

  5. Read the replicated-secrets/learn-failover secret with the cluster A initial root token.

    $ curl \
        --silent \
        --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
        http://127.0.0.1:8220/v1/replicated-secrets/data/learn-failover \
        | jq
    

    Successful example output:

    {
    "request_id": "6fc03d6f-9132-a1df-262f-4546bfe0f7a8",
    "lease_id": "",
    "renewable": false,
    "lease_duration": 0,
    "data": {
       "data": {
          "failover": "true"
       },
       "metadata": {
          "created_time": "2022-12-06T18:19:02.739248459Z",
          "custom_metadata": null,
          "deletion_time": "",
          "destroyed": false,
          "version": 2
       }
    },
    "wrap_info": null,
    "warnings": null,
    "auth": null
    }
    

Vault returns the expected value, and cluster A is now a secondary cluster to cluster B.

  1. Access the browser tab for cluster B.

  2. Click Status and Disaster Recovery Primary.

  3. Click Secondaries.

  4. Click Add secondary.

    Add secondary

  5. Populate the Secondary ID field with cluster-a, and click Generate token. Generate secondary token

  6. Click Copy & Close to copy the token which you will need to enable the DR secondary cluster A. Consider keeping this value someplace other than the system clipboard so it's not overwritten before you need to use it later.

    Generated a secondary token

    NOTE: Paste the token value somewhere safe where you can retrieve it for a later step. DO NOT leave it in the system clipboard, as it will be overwritten by another copied value later.

    Now that you have a new secondary token, perform the next steps on cluster A.

  7. Access the browser tab for cluster A.

  8. Click Status and Disaster Recovery Secondary.

  9. Click Manage.

  10. Click Update to update the primary information.

  11. Paste your batch DR operation token value into the DR operation token field.

  12. Paste the new secondary activation token value in the Secondary activation token field.

  13. Click Update.

  14. Check replication status on cluster B by clicking Details.

    cluster A status

Cluster A is now in mode secondary, and shows that it has a primary at the value of primary_cluster_addr of https://secondary (cluster B) as expected.

Failback scenario

Now it's time to failback, and restore the clusters to their initial replication state.

At this point cluster B is the new primary with cluster A as the secondary. You will now promote Cluster A (the original primary) back to primary.

Verify replication status on cluster A.

$ VAULT_ADDR=http://127.0.0.1:8200 VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN \
    vault read sys/replication/dr/status

Successful output example:

Key                            Value
---                            -----
cluster_id                     d8f8a096-c55e-d13f-0274-faadb011b0b0
connection_state               ready
known_primary_cluster_addrs    [https://cluster-b:8221]
last_reindex_epoch             0
last_remote_wal                0
merkle_root                    6c73ab55554f73868b504d7cae470b5dd82f3833
mode                           secondary
primaries                      [map[api_address:http://cluster-b:8220 cluster_address:https://cluster-b:8221 connection_status:connected last_heartbeat:2022-12-05T18:26:39Z]]
primary_cluster_addr           https://cluster-b:8221
secondary_id                   clusterA
state                          stream-wals

Verify replication status on cluster B.

$ VAULT_ADDR=http://127.0.0.1:8220 VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN \
    vault read sys/replication/dr/status

Successful output example:

Key                     Value
---                     -----
cluster_id              d14a98dc-3651-ca72-1e8a-b18cff2240ef
known_secondaries       [clusterA]
last_dr_wal             41
last_reindex_epoch      0
last_wal                41
merkle_root             5c1d0af68825331681f846a8ee6282f23f18f31e
mode                    primary
primary_cluster_addr    n/a
secondaries             [map[api_address:http://primary:8200 cluster_address:https://cluster-a:8201 connection_status:connected last_heartbeat:2022-10-31T16:29:30Z node_id:clusterA]]
state                   running

Verify replication status on cluster A.

$ curl --silent http://127.0.0.1:8200/v1/sys/replication/dr/status | jq

Successful output example:

{
  "request_id": "04133e25-c3b0-018a-377d-16f23f9a3827",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "cluster_id": "b2829aba-bbab-216a-0e4d-e568efc121c6",
    "connection_state": "ready",
    "known_primary_cluster_addrs": [
      "https://cluster-b:8221"
    ],
    "last_reindex_epoch": "0",
    "last_remote_wal": 0,
    "merkle_root": "e5933d66e7c3a37828710f5c6fa392dec6b5f040",
    "mode": "secondary",
    "primaries": [
      {
        "api_address": "http://cluster-b:8220",
        "cluster_address": "https://cluster-b:8221",
        "connection_status": "connected",
        "last_heartbeat": "2022-12-06T19:42:55Z"
      }
    ],
    "primary_cluster_addr": "https://cluster-b:8221",
    "secondary_id": "cluster-b",
    "state": "stream-wals"
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

Verify replication status on cluster B.

$ curl --silent http://127.0.0.1:8220/v1/sys/replication/dr/status | jq

Successful output example:

{
  "request_id": "282748de-fad2-27e8-dd68-7152159f9e39",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": {
    "cluster_id": "b2829aba-bbab-216a-0e4d-e568efc121c6",
    "known_secondaries": [
      "cluster-b"
    ],
    "last_dr_wal": 43,
    "last_reindex_epoch": "0",
    "last_wal": 43,
    "merkle_root": "e5933d66e7c3a37828710f5c6fa392dec6b5f040",
    "mode": "primary",
    "primary_cluster_addr": "",
    "secondaries": [
      {
        "api_address": "http://cluster-a:8200",
        "cluster_address": "https://cluster-a:8201",
        "connection_status": "connected",
        "last_heartbeat": "2022-12-06T19:43:10Z",
        "node_id": "cluster-b"
      }
    ],
    "state": "running"
  },
  "wrap_info": null,
  "warnings": null,
  "auth": null
}

From the replication status output, you can learn that cluster B is the primary, cluster A is the secondary, and replication is running and in stream-wals state.

You can now start the failback workflow.

Promote cluster A to primary status

Begin failback by promoting cluster A to primary status.

NOTE: At this point, you should begin redirecting all client traffic back to cluster A after its promotion to primary.

Use the batch DR operation token value from the CLUSTER_B_DR_OP_TOKEN environment variable to promote cluster A back to primary status.

$ VAULT_ADDR=http://127.0.0.1:8200 VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN \
    vault write -f sys/replication/dr/secondary/promote \
    dr_operation_token=$CLUSTER_B_DR_OP_TOKEN

Successful output example:

WARNING! The following warnings were returned from Vault:

  * This cluster is being promoted to a replication primary. Vault will be
  unavailable for a brief period and will resume service shortly.

Use the batch DR operation token value from the CLUSTER_B_DR_OP_TOKEN environment variable to promote cluster A back to primary status.

Create a payload.json and pass in the CLUSTER_B_DR_OP_TOKEN token value.

$ tee payload.json << EOF
{ "dr_operation_token": "$CLUSTER_B_DR_OP_TOKEN" }
EOF

Promote cluster A to primary.

$ curl \
    --silent \
    --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
    --request POST \
    --data @payload.json \
    http://127.0.0.1:8200/v1/sys/replication/dr/secondary/promote \
    | jq    

Successful output example:

{
  "request_id": "8dee1741-532f-c01b-78d1-90ece034f09a",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": null,
  "wrap_info": null,
  "warnings": [
    "This cluster is being promoted to a replication primary. Vault will be unavailable for a brief period and will resume service shortly."
  ],
  "auth": null
}
  1. Access the browser tab for cluster A.

  2. In Disaster Recovery, click Manage.

    Manage cluster A

  3. Click Promote.

    Promote cluster A begin

  4. Enter the batch DR operation token value into the DR Operation Token field.

  5. Click Promote.

    Promote cluster A complete

Demote cluster B to secondary status

Demote cluster B back to secondary status.

$ VAULT_ADDR=http://127.0.0.1:8220 VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN \
    vault write -f sys/replication/dr/primary/demote

Successful output example:

WARNING! The following warnings were returned from Vault:

  * This cluster is being demoted to a replication secondary. Vault will be
  unavailable for a brief period and will resume service shortly.

Demote cluster B back to secondary status.

$ curl \
    --silent \
    --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
    --request POST \
    http://127.0.0.1:8220/v1/sys/replication/dr/primary/demote \
    | jq

Successful output example:

{
  "request_id": "5eeb2fd0-598d-9b67-e82d-dfdaf41a96f8",
  "lease_id": "",
  "renewable": false,
  "lease_duration": 0,
  "data": null,
  "wrap_info": null,
  "warnings": [
    "This cluster is being demoted to a replication secondary. Vault will be unavailable for a brief period and will resume service shortly."
  ],
  "auth": null
}
  1. Access the browser tab for cluster B.

  2. In Disaster Recovery, click Manage.

    Manage cluster B

  3. Click Demote.

    Demote cluster B begin

  4. Confirm demotion by entering Disaster Recovery into the field.

  5. Click Demote.

    Demote cluster B complete

Confirm replication status and access to data

The goal of this section is to check the replication status of cluster A and B, and read the secret data to confirm the failback.

  1. Verify replication status on cluster A.

    $ VAULT_ADDR=http://127.0.0.1:8200 VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN \
       vault read sys/replication/dr/status
    

    Successful output example:

    Key                     Value
    ---                     -----
    cluster_id              d14a98dc-3651-ca72-1e8a-b18cff2240ef
    known_secondaries       []
    last_dr_wal             71
    last_reindex_epoch      0
    last_wal                71
    merkle_root             af89e30c16ea03009df256991bf3c6ec4e8b390a
    mode                    primary
    primary_cluster_addr    n/a
    secondaries             []
    state                   running
    
  2. Verify replication state on cluster B.

    $ VAULT_ADDR=http://127.0.0.1:8220 VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN \
       vault read sys/replication/dr/status
    

    Successful output example:

    Key                            Value
    ---                            -----
    cluster_id                     d14a98dc-3651-ca72-1e8a-b18cff2240ef
    known_primary_cluster_addrs    [https://cluster-a:8201]
    last_reindex_epoch             0
    merkle_root                    5c1d0af68825331681f846a8ee6282f23f18f31e
    mode                           secondary
    primaries                      []
    primary_cluster_addr           n/a
    secondary_id                   n/a
    state                          idle
    

    The status indicates that the clusters are replicating again in their original state with cluster A being the primary and cluster B the secondary.

  3. Try to update the secret data in cluster A.

    $ VAULT_ADDR=http://127.0.0.1:8200 VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN \
       vault kv put replicated-secrets/learn-failover failover=false
    

    Successful example output:

    ============= Secret Path =============
    replicated-secrets/data/learn-failover
    
    ======= Metadata =======
    Key                Value
    ---                -----
    created_time       2022-09-26T17:03:48.661780123Z
    custom_metadata    <nil>
    deletion_time      n/a
    destroyed          false
    version            3
    

    You have created a second version of the secret while cluster A is once again acting as the primary cluster.

  1. Verify replication status on cluster A.

    $ curl --silent http://127.0.0.1:8200/v1/sys/replication/dr/status | jq
    

    Successful output example:

    {
    "request_id": "789af2e5-644f-e2bd-e7bd-42bdb46d1c2b",
    "lease_id": "",
    "renewable": false,
    "lease_duration": 0,
    "data": {
       "cluster_id": "b2829aba-bbab-216a-0e4d-e568efc121c6",
       "known_secondaries": [],
       "last_dr_wal": 72,
       "last_reindex_epoch": "0",
       "last_wal": 72,
       "merkle_root": "7fafaa1c4ec91ab509c6312d7c89caa49e092984",
       "mode": "primary",
       "primary_cluster_addr": "",
       "secondaries": [],
       "state": "running"
    },
    "wrap_info": null,
    "warnings": null,
    "auth": null
    }
    
  2. Verify replication state on cluster B.

    $ curl --silent http://127.0.0.1:8220/v1/sys/replication/dr/status | jq
    

    Successful output example:

    {
    "request_id": "b0d4f8e7-b169-aa12-8ae7-59f21b280a2f",
    "lease_id": "",
    "renewable": false,
    "lease_duration": 0,
    "data": {
       "cluster_id": "b2829aba-bbab-216a-0e4d-e568efc121c6",
       "known_primary_cluster_addrs": [
          "https://cluster-a:8201"
       ],
       "last_reindex_epoch": "0",
       "merkle_root": "e5933d66e7c3a37828710f5c6fa392dec6b5f040",
       "mode": "secondary",
       "primaries": [],
       "primary_cluster_addr": "",
       "secondary_id": "",
       "state": "idle"
    },
    "wrap_info": null,
    "warnings": null,
    "auth": null
    }
    

    The status indicates that the clusters are replicating again in their original state with cluster A being the primary and cluster B the secondary.

  3. Create a payload for updating the secret.

    $ tee payload.json << EOF
    {
       "data": {
          "failover": false,
          "secret": "984UIFBH4HK3M84"
       }
    }
    EOF
    
  4. Try to update the secret data in cluster A.

    $ curl \
        --silent \
        --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
        --request POST \
        --data @payload.json \
       http://127.0.0.1:8200/v1/replicated-secrets/data/learn-failover \
       | jq
    

    Successful example output:

    {
    "request_id": "cba26c50-1b2b-8245-74d6-63a755473893",
    "lease_id": "",
    "renewable": false,
    "lease_duration": 0,
    "data": {
       "created_time": "2022-12-06T20:05:28.116566722Z",
       "custom_metadata": null,
       "deletion_time": "",
       "destroyed": false,
       "version": 3
    },
    "wrap_info": null,
    "warnings": null,
    "auth": null
    }
    

    You have created a second version of the secret while cluster A is once again acting as the primary cluster.

  1. From your terminal session, copy the value of $CLUSTER_A_ROOT_TOKEN to your system clipboard. You can use a system-specific clipboard utility like pbcopy for this step.

    $ echo $CLUSTER_A_ROOT_TOKEN | pbcopy
    
  2. Access the browser tab for cluster A and sign in by pasting the root token value into the Token field.

  3. Click Secrets.

  4. Click replicated-secrets/

    replicated-secrets

  5. Click learn-failover.

    Secret at version two

    The secret is at version 2.

    Create a third version of the secret, and set the value of key failover to false.

  6. Click Create new version.

    Create third secret version

  7. Click the eye icon to reveal the value beside the failover field to reveal the current value true.

    Reveal value

  8. Enter false in the field next to the failover field.

    Updated secret value

  9. Click Save to create a new version 2 of the secret.

    Updated secret value created

    You have created a third version of the secret while cluster A is once again acting as the primary cluster.

Update replication primary on cluster B

The goal of this section is to update cluster B and point it to cluster A as the new primary cluster.

  1. Generate a secondary token on cluster A and assign its value to the exported environment variable CLUSTER_A_DR_SECONDARY_TOKEN.

    $ export CLUSTER_A_DR_SECONDARY_TOKEN="$(VAULT_ADDR=http://127.0.0.1:8200 \
       VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN \
       vault write -field wrapping_token \
       sys/replication/dr/primary/secondary-token id=ClusterB)"
    

    This command produces no output.

  1. DR Operation Tokens are one-time use, so you need to generate a new one for this step. Use environment variables to override the vault host and root token values, and generate the DR Operation Token by initializing.

    $ VAULT_ADDR=http://127.0.0.1:8220 \
       VAULT_TOKEN=$CLUSTER_B_ROOT_TOKEN \
       vault operator generate-root -dr-token -init
    

    Successful output example:

     A One-Time-Password has been generated for you and is shown in the OTP field.
     You will need this value to decode the resulting root token, so keep it safe.
     Nonce         7854eb3e-7338-b4ad-2408-6e598a5684e9
     Started       true
     Progress      0/1
     Complete      false
     OTP           3BYcM71jcwIYDm8auxIFvHhATbRM
     OTP Length    28
    
  2. Export the OTP value from the earlier output as the environment variable CLUSTER_B_DR_OTP.

    $ export CLUSTER_B_DR_OTP=3BYcM71jcwIYDm8auxIFvHhATbRM
    
  3. Display the cluster A unseal key value to use in the next step:

    $ echo $CLUSTER_A_UNSEAL_KEY
    Sx4AOb7X5ShyQ97sL5g7nhUn2l+IYv64GYucApVdm44=
    
  4. Generate the encoded token value.

    $ VAULT_ADDR=http://127.0.0.1:8220 \
       VAULT_TOKEN=$CLUSTER_B_ROOT_TOKEN \
       vault operator generate-root -dr-token
    

    When prompted, enter the unseal key from cluster A.

    Successful output example:

     Nonce            7854eb3e-7338-b4ad-2408-6e598a5684e9
     Started          true
     Progress         1/1
     Complete         true
     Encoded Token    WzQqTQdRfzohHT4hAyxMNjcSfxAuIVgVPgQ7KQ
    
  5. Export the "Encoded Token" value from the earlier output as the environment variable CLUSTER_B_DR_ENCODED_TOKEN.

    $ export CLUSTER_B_DR_ENCODED_TOKEN=WzQqTQdRfzohHT4hAyxMNjcSfxAuIVgVPgQ7KQ
    
  6. Complete the DR operation token generation, and export the resulting token value as the environment variable CLUSTER_B_DR_OP_TOKEN for later use.

    $ export CLUSTER_B_DR_OP_TOKEN=$(VAULT_ADDR=http://127.0.0.1:8220 \
       VAULT_TOKEN=$CLUSTER_B_ROOT_TOKEN \
       vault operator generate-root \
       -dr-token \
       -otp=$CLUSTER_B_DR_OTP \
       -decode=$CLUSTER_B_DR_ENCODED_TOKEN)
    
  1. Echo the CLUSTER_B_DR_OP_TOKEN environment variable to confirm that it's set.

    $ echo $CLUSTER_B_DR_OP_TOKEN
    

    Successful output example:

    hvb.AAAAAQLE0eZ9DZREm3xrKWoc-KpbejFcyr8YEyiiORdxSMt_PKwDT-b_9AUavF3w4NcVQJQyO4-BRfahEb0h9GEE9vI-EidU2RjGd7UFdrXph6iSchNtMjC7sVFqH_Y558yx_D_LN1bSPA8vsq-ADzKnnvw5rACn4BREv7QdrPBytX2JkStDrevVzLlWsHS1UF0xFACxSQ
    
  2. Update cluster so that it uses cluster A as the new primary cluster.

    $ VAULT_ADDR=http://127.0.0.1:8220 \
          vault write sys/replication/dr/secondary/update-primary \
          dr_operation_token=$CLUSTER_B_DR_OP_TOKEN token=$CLUSTER_A_DR_SECONDARY_TOKEN
    
    WARNING! The following warnings were returned from Vault:
    
    * Vault has successfully found secondary information; it may take a while to
    perform setup tasks. Vault will be unavailable until these tasks and initial
    sync complete.
    
  3. Now check replication status on cluster B.

    $ VAULT_ADDR=http://127.0.0.1:8220 VAULT_TOKEN=$CLUSTER_A_ROOT_TOKEN \
       vault read sys/replication/dr/status
    

    Successful example output:

    Key                            Value
    ---                            -----
    cluster_id                     e3619c22-8958-8d24-c374-c5630988b300
    known_primary_cluster_addrs    [https://cluster-a:8201]
    last_reindex_epoch             0
    merkle_root                    7a82bc8a3bc5be71342661424fae40cee94786a5
    mode                           secondary
    primaries                      []
    primary_cluster_addr           n/a
    secondary_id                   n/a
    state                          idle
    

    The output shows that cluster B is now a secondary with a known primary cluster address that matches cluster A.

Point cluster A to cluster B so that cluster B becomes a secondary cluster of the new primary cluster A. Use the newly generated operation and secondary tokens values.

  1. Generate a secondary token on cluster A and assign its value to the exported environment variable CLUSTER_A_DR_SECONDARY_TOKEN.

    
    $ export CLUSTER_A_DR_SECONDARY_TOKEN=$(curl \
        --silent \
        --request POST \
        --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
        --data '{"id": "ClusterB"}' \
        http://127.0.0.1:8200/v1/sys/replication/dr/primary/secondary-token \
        | jq -r '.wrap_info.token' | tr -d '"')
    

    This command produces no output.

  2. Echo the CLUSTER_A_DR_SECONDARY_TOKEN environment variable to confirm that it's set.

    $ echo $CLUSTER_A_DR_SECONDARY_TOKEN
    

    Successful output example:

    eyJhbGciOiJFUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NvciI6IiIsImFkZHIiOiJodHRwOi8vY2x1c3Rlci1hOjgyMDAiLCJleHAiOjE2NzAzNTkxODcsImlhdCI6MTY3MDM1NzM4NywianRpIjoiaHZzLk83RTJER24xZjZNQ2tCRkpPZFUyVkxpSCIsIm5iZiI6MTY3MDM1NzM4MiwidHlwZSI6IndyYXBwaW5nIn0.AYmjdeAD4B8km5psHnj9q4xeF9Mc0ff33C-vXiT2Oc3aYlLUWl_B0I8b4q8RhkgDfOvLjQ3l0xdRlmb3A5tS9-RTAETU9YXES64hzgJ_v-IbxSnhYNUBckbyqUIf0NDk45KJaEYA454C9S30Qun0AmOiNmgk7iStNIizyN0B39c0ff33
    
  3. Echo the CLUSTER_B_DR_OP_TOKEN environment variable to confirm that it's set.

    $ echo $CLUSTER_B_DR_OP_TOKEN
    

    Successful output example:

    hvb.AAAAAQLE0eZ9DZREm3xrKWoc-KpbejFcyr8YEyiiORdxSMt_PKwDT-b_9AUavF3w4NcVQJQyO4-BRfahEb0h9GEE9vI-EidU2RjGd7UFdrXph6iSchNtMjC7sVFqH_Y558yx_D_LN1bSPA8vsq-ADzKnnvw5rACn4BREv7QdrPBytX2JkStDrevVzLlWsHS1UF0xFACxSQ
    
  4. Create a payload.json` to pass in the token values.

    $ tee payload.json << EOF
    {
       "dr_operation_token": "$CLUSTER_B_DR_OP_TOKEN",
       "token": "$CLUSTER_A_DR_SECONDARY_TOKEN"
    }
    EOF
    
  5. Update cluster B so that it uses cluster A as the new primary cluster.

    $ curl \
        --silent \
        --header "X-Vault-Token: $CLUSTER_A_ROOT_TOKEN" \
        --request POST \
        --data @payload.json \
     http://127.0.0.1:8220/v1/sys/replication/dr/secondary/update-primary \
     | jq
    
    {
    "request_id": "d555aab0-381d-51b7-6bab-3f79cd6b1156",
    "lease_id": "",
    "renewable": false,
    "lease_duration": 0,
    "data": null,
    "wrap_info": null,
    "warnings": [
       "Vault has successfully found secondary information; it may take a while to perform setup tasks. Vault will be unavailable until these tasks and initial sync complete."
    ],
    "auth": null
    }
    
  6. Now check replication status on cluster B.

    $ curl --silent http://127.0.0.1:8220/v1/sys/replication/dr/status | jq
    

    Successful example output:

    {
    "request_id": "65a96e7b-b9ae-704a-4607-ac4ccd55c3c7",
    "lease_id": "",
    "renewable": false,
    "lease_duration": 0,
    "data": {
       "cluster_id": "b2829aba-bbab-216a-0e4d-e568efc121c6",
       "connection_state": "ready",
       "known_primary_cluster_addrs": [
          "https://cluster-a:8201"
       ],
       "last_reindex_epoch": "0",
       "last_remote_wal": 0,
       "merkle_root": "4f4dd88e48c539279fd5e120f519892da2687732",
       "mode": "secondary",
       "primaries": [
          {
          "api_address": "http://cluster-a:8200",
          "cluster_address": "https://cluster-a:8201",
          "connection_status": "connected",
          "last_heartbeat": "2022-12-06T20:18:36Z"
          }
       ],
       "primary_cluster_addr": "https://cluster-a:8201",
       "secondary_id": "ClusterB",
       "state": "stream-wals"
    },
    "wrap_info": null,
    "warnings": null,
    "auth": null
    }
    

    The output shows that cluster B is now a secondary with a known primary cluster address that matches cluster A.

  1. Access the browser tab for cluster A.

  2. Click Status.

  3. Click Disaster Recovery Primary.

  4. Click Manage.

  5. In the Known Secondaries section, click Add secondary.

    Adding a secondary from the DR primary

  6. Populate the Secondary ID field with cluster-b, and click Generate token. Generating a secondary token

  7. Click Copy & Close to copy the token which you will need to enable the DR secondary cluster B. Consider keeping this value someplace other than the system clipboard so it's not overwritten before you need to use it later.

    Generated a secondary token

    NOTE: Paste the token value somewhere safe where you can retrieve it for a later step. DO NOT leave it in the system clipboard, as it will be overwritten by another copied value later.

You must perform following operations on cluster B.

  1. Access the browser tab for cluster B.

  2. Click Status.

  3. Click Disaster Recovery Primary.

  4. Click Manage.

  5. Paste your batch DR operation token value into the DR operation token field.

  6. Paste the token you copied from the primary in the Secondary activation token field.

    Choosing 'Disaster Recovery' for a secondary mode

  7. Click Update. (Warning: This immediately clears all data in the secondary cluster.)

  8. Click the Details tab to see replication details.

    Replication status cluster B

You have completed the failover and failback scenario with the Vault DR Replication feature.

Clean up

  1. Stop the Docker containers (this will also automatically remove them).

    $ docker stop vault-enterprise-cluster-a vault-enterprise-cluster-b
    vault-enterprise-cluster-a
    vault-enterprise-cluster-b
    
  2. Remove the Docker network.

    $ docker network rm learn-vault
    learn-vault
    
  3. Change into your home directory

    $ cd ..
    
  4. Remove the learn-vault-lab project directory.

    $ rm -rf "${HC_LEARN_LAB}"
    
  5. Unset the environment variables

    $ unset \
    CLUSTER_A_UNSEAL_KEY \
    CLUSTER_A_ROOT_TOKEN \
    CLUSTER_A_DR_OTP \
    CLUSTER_A_DR_ENCODED_TOKEN \
    CLUSTER_A_DR_SECONDARY_TOKEN \
    CLUSTER_A_DR_OP_TOKEN \
    CLUSTER_B_UNSEAL_KEY \
    CLUSTER_B_ROOT_TOKEN \
    CLUSTER_B_DR_OP_TOKEN \
    MY_VAULT_LICENSE \
    VAULT_ADDR \
    VAULT_TOKEN
    

Summary

You have learned how to establish a DR replication configuration between a primary and secondary cluster. You have also learned the essential workflow for failover from an existing primary cluster and failback to the original cluster state after operating in a failed over state.

Next steps

You can learn more about replication, including popular topics such as monitoring replication, setting up Performance Replication, and Performance Replication with HashiCorp Cloud Platform (HCP) Vault.

Help and resources

  • Disaster Recovery Replication Setup

  • DR Replication API documentation

 Previous
 Next

On this page

  1. Disaster Recovery Replication Failover and Failback
  2. Challenge
  3. Solution
  4. Prerequisites
  5. Scenario introduction
  6. Prepare environment
  7. Configure replication
  8. Enable replication on cluster A
  9. Enable replication on cluster B
  10. Confirm replication status
  11. Failover scenario
  12. Failback scenario
  13. Clean up
  14. Summary
  15. Next steps
  16. Help and resources
Give Feedback(opens in new tab)
  • Certifications
  • System Status
  • Terms of Use
  • Security
  • Privacy
  • Trademark Policy
  • Trade Controls
  • Give Feedback(opens in new tab)