Vault on Kubernetes deployment guide

27min
|
Vault

This deployment guide covers the steps required to install and configure a single HashiCorp Vault cluster with integrated storage in Kubernetes. Although it is not strictly necessary to adhere to the Vault Reference Architecture, please ensure that you are familiar with the recommendations in the document, deviating from it only when you fully understand the consequences.

Note

There are clear benefits to running Vault in Kubernetes, but as a general rule, choose it as your deployment solution when it is your organization's preferred approach, and all involved parties understand its operational nuances.

Key points to note:

Install Vault on a dedicated Kubernetes cluster when possible
If a dedicated cluster is unavailable, use appropriate mechanisms for workload isolation
Use Integrated Storage for the storage backend to ensure high availability (HA)
Strongly protect the storage backend using filesystem permissions and Kubernetes role-based access controls
Strictly limit access to the cloud KMS or HSM if auto-unseal is in use

To complete this guide, you will need to have sufficient compute resources available to host a 5-node Vault cluster with anti-affinity rules applied.

Concepts overview

The Vault Helm chart is the recommended method for installing and configuring Vault on Kubernetes.

While the Helm chart automatically sets up complex resources and exposes the configuration to meet your requirements, it does not automatically operate Vault. You are still responsible for cluster initialization, monitoring, backup, upgrades, and ongoing operations.

IMPORTANT NOTE

The Vault Helm chart is not compatible with Helm 2. Confirm that you are using Helm 3 before proceeding to the next steps.

Security Warning

By default, the chart deploys Vault in standalone mode, resulting in a single Vault server with a file storage backend. This is a less secure and less resilient configuration which is NOT appropriate for a production installation. We strongly recommended that you use a properly secured Kubernetes cluster, learn the available configuration options, and read the production deployment checklist at the end of this guide as well.

In this tutorial, we will override the default configuration to specify an Integrated Storage backend using a StatefulSet with 5 cluster nodes, following current HashiCorp recommendations and the Reference Architecture for Integrated Storage.

Kubernetes StatefulSets

In Kubernetes, several types of workload controller primitives exist, one of which is the StatefulSet. A StatefulSet manages the deployment and scaling of a set of Pods, and provides guarantees about the ordering and uniqueness of those Pods. Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec, but unlike a Deployment, a StatefulSet maintains a sticky identity for each of its Pods. These Pods are created from the same spec, but are not interchangeable; each has a persistent identifier that it maintains across rescheduling.

The Vault Helm chart uses the StatefulSet deployment model. Although individual Pods in a StatefulSet are susceptible to failure, the persistent Pod identifiers make it easier to match existing volumes to new Pods that may replace any that have failed.

IMPORTANT NOTE

The Vault Helm chart specifies Anti-Affinity rules for the cluster StatefulSet, requiring an available Kubernetes node per Pod. However, popular managed Kubernetes implementations offered by the major cloud providers, such as Google Kubernetes Engine (GKE) and Amazon Elastic Kubernetes Service (EKS), commonly default to 3-node cluster topologies. We recommend that you manually scale your cluster, configure horizontal auto-scaling, or use a fully orchestrated solution that provides automatic node management and scaling, such as GKE Autopilot.

Kubernetes namespaces

We recommend using a dedicated namespace for distinct applications in a Kubernetes cluster for the purposes of logical separation, isolation, access management, and overall ease of long-term operations. This approach is beneficial even if Vault is the only application installed in the cluster.

Example:

Create a Kubernetes namespace.

$ kubectl create namespace vault

View all resources in a namespace.

$ kubectl get all --namespace vault

Setup Helm repo

Helm must be installed and configured on your machine. For help installing Helm, please refer to the Helm documentation or the Vault Installation to minikube via Helm with Integrated Storage tutorial.

To access the Vault Helm chart, add the Hashicorp Helm repository.

$ helm repo add hashicorp https://helm.releases.hashicorp.com
"hashicorp" has been added to your repositories

Check that you have access to the chart.

$ helm search repo hashicorp/vault
NAME                                CHART VERSION   APP VERSION DESCRIPTION
hashicorp/vault                     0.28.0          1.16.1      Official HashiCorp Vault Chart

Using Helm charts

The helm install command requires the parameters [release name], [repo/chart], and permits the specification of a target namespace via the --namespace flag.

It is common for Helm charts to be subject to ongoing development, therefore we recommend first running Helm with the --dry-run option. This flag will cause Helm to print the computed YAML manifests that will ultimately be applied, allowing you to inspect and verify the actions it will take prior to actually doing so.

By default, a Helm chart is installed using the configuration specified in the repository's values.yaml file. There are several methods available to alter these settings: individual values can be overridden using the --set option, a values.yaml file can be provided, or a combination of the two. See the official Helm documentation for more information.

The Vault Helm chart default values are published in the chart repository. This file outlines all of the available configuration options.

Examples:

Run helm install dry-run.

$ helm install vault hashicorp/vault --namespace vault --dry-run

List available releases.

$ helm search repo hashicorp/vault --versions
NAME                                CHART VERSION   APP VERSION DESCRIPTION
hashicorp/vault                     0.28.0          1.16.1      Official HashiCorp Vault Chart
hashicorp/vault                     0.27.0          1.15.2      Official HashiCorp Vault Chart
...

Install version 0.28.0.

$ helm install vault hashicorp/vault --namespace vault --version 0.28.0

Note

See the Vault Helm chart Changelog for the difference between versions.

Override default settings.

$ helm install vault hashicorp/vault \
    --namespace vault \
    --set "server.ha.enabled=true" \
    --set "server.ha.replicas=5" \
    --dry-run

Alternatively, specify the desired configuration in a file, override-values.yml.

$ cat > override-values.yml <<EOF
server:
  ha:
    enabled: true
    replicas: 5

EOF

Override the default configuration with the values read from the override-values.yml file.

$ helm install vault hashicorp/vault \
    --namespace vault \
    -f override-values.yml \
    --dry-run

Note

See the Vault Helm Configuration page for a full list of available options and their descriptions.

Configure Vault Helm chart

For a production-ready install, Vault should be installed in high availability (HA) mode. This installs a StatefulSet of Vault server Pods with either Integrated Storage or a Consul storage backend. In this guide, we will demonstrate an HA mode installation with Integrated Storage.

Note

Vault Integrated Storage implements the Raft storage protocol and is commonly referred to as Raft in HashiCorp Vault Documentation. If using HA mode with a Consul storage backend, we recommend using the Consul Helm chart.

The below override-values.yaml file specifies a subset of values for parameters that are often modified when deploying Vault in production on Kubernetes. After creating this file, Helm can reference it to install Vault with a customized configuration.

$ cat > override-values.yml << EOF
# Vault Helm chart Value Overrides
global:
  enabled: true
  tlsDisable: false
  resources:
    requests:
      memory: 256Mi
      cpu: 250m
    limits:
      memory: 256Mi
      cpu: 250m

server:
  # Use the Enterprise Image
  image:
    repository: "hashicorp/vault-enterprise"
    tag: "1.16.1-ent"

  # These Resource Limits are in line with node requirements in the
  # Vault Reference Architecture for a Small Cluster
  resources:
    requests:
      memory: 8Gi
      cpu: 2000m
    limits:
      memory: 16Gi
      cpu: 2000m

  # For HA configuration and because we need to manually init the vault,
  # we need to define custom readiness/liveness Probe settings
  readinessProbe:
    enabled: true
    path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
  livenessProbe:
    enabled: true
    path: "/v1/sys/health?standbyok=true"
    initialDelaySeconds: 60

  # extraEnvironmentVars is a list of extra environment variables to set with the stateful set. These could be
  # used to include variables required for auto-unseal.
  extraEnvironmentVars:
    VAULT_CACERT: /vault/userconfig/tls-ca/ca.crt

  # extraVolumes is a list of extra volumes to mount. These will be exposed
  # to Vault in the path `/vault/userconfig/<name>/`.
  extraVolumes:
    - type: secret
      name: tls-server
    - type: secret
      name: tls-ca
    - type: secret
      name: kms-creds

  # This configures the Vault Statefulset to create a PVC for audit logs.
  # See https://www.vaultproject.io/docs/audit/index.html to know more
  auditStorage:
    enabled: true

  standalone:
    enabled: false

  # Run Vault in "HA" mode.
  ha:
    enabled: true
    replicas: 5
    raft:
      enabled: true
      setNodeId: true

      config: |
        ui = true
        cluster_name = "vault-integrated-storage"
        listener "tcp" {
          address = "[::]:8200"
          cluster_address = "[::]:8201"
          tls_cert_file = "/vault/userconfig/tls-server/tls.crt"
          tls_key_file = "/vault/userconfig/tls-server/tls.key"
        }

        storage "raft" {
          path = "/vault/data"
          retry_join {
            leader_api_addr = "https://vault-0.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
            leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
            leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
          }
          retry_join {
            leader_api_addr = "https://vault-1.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
            leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
            leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
          }
          retry_join {
            leader_api_addr = "https://vault-2.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
            leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
            leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
          }
          retry_join {
            leader_api_addr = "https://vault-3.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
            leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
            leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
          }
          retry_join {
            leader_api_addr = "https://vault-4.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
            leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
            leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
          }
        } 

# Vault UI
ui:
  enabled: true
  serviceType: "LoadBalancer"
  serviceNodePort: null
  externalPort: 8200
EOF

Overrides

In the above override-values.yml file, we have defined several import changes to the global, server, and ui configuration stanzas that will be further explored below.

TLS certificates

The environment variable VAULT_CACERT has been configured using the server.extraEnvironmentVars attribute:

server:
  extraEnvironmentVars:
    VAULT_CACERT: /vault/userconfig/tls-ca/ca.crt

To create this file and path inside the Vault Pods, you can create a Kubernetes secret from the contents of a certificate PEM file, which can then be mounted using the server.extraVolumes attribute.

Note

Mounted Kubernetes secrets represent an installation dependency. Any secrets referenced in the Helm configuration must exist in the target Kubernetes namespace prior to installing Vault.

A Kubernetes secret of type kubernetes.io/tls can be created from an existing PEM-formatted certificate available on the machine from which you are executing Helm commands.

To create a secret for the CA certificate:

$ kubectl --namespace vault create secret tls tls-ca --cert ./tls-ca.cert

To create a secret for the server certificate and associated private key:

$ kubectl --namespace vault create secret tls tls-server --cert ./tls-server.cert --key ./tls-server.key

A generic Kubernetes secret of type kubernetes.io/opaque is created in a similar manner, for example:

$ kubectl create secret generic consul-token --from-file=./consul_token

Version pinning

In production, we recommend that images be pinned to a specific version, as newer releases of the Vault Helm chart may cause unintended upgrades to chart components. For example:

image:
  repository: "hashicorp/vault-enterprise"
  tag: "1.16.1-ent"

Pod resource limits

In a Kubernetes environment, placing resource limits on workloads to control utilization is a best practice. In the above override-values.yaml file, request limits are defined for the Vault server Pod(s) based on the requirements outlined in the Vault Reference Architecture. Set these values based on your environment to ensure cluster stability.

server:
  resources:
    requests:
      memory: 8Gi
      cpu: 2000m
    limits:
      memory: 16Gi
      cpu: 2000m

In this example, we are requesting 8 Gigabytes (Gi) of memory and 2000 milli-cpus (m) (equivalent to 2 vCPUs) and placing a hard limit of 16Gi and 2000m, respectively. This prevents the container from using more than the configured resource limit. For example, when a process in the container tries to consume more than the allowed amount of memory, the system kernel terminates the process that attempted the allocation, resulting in an out of memory (OOM) error.

See note on request limits

Stateful storage

Both the dataStorage and auditStorage keys allow for the specification of a PersistentVolumeClaim (PVC). Enabling PVCs configures the Vault Statefulset with persistent volumes for data storage when using the file or Raft storage backends, decoupling Pod lifecycle events from the underlying storage device. Use of PVCs is highly recommended.

server:
  dataStorage:
    enabled: true
  auditStorage:
    enabled: true

Refer to the Vault Integrated Storage documentation for more information.

Vault listener

The Vault server HCL configuration is specified in server.ha.config. The listener portion sets the addresses and ports on which Vault will respond to requests.

listener "tcp" {
  address = "[::]:8200"
  cluster_address = "[::]:8201"
  tls_cert_file = "/vault/userconfig/tls-server/tls.crt"
  tls_key_file = "/vault/userconfig/tls-server/tls.key"
}

See the documentation for more information about the tcp listener configuration and available options.

Warning

TLS should always be configured in production environments to ensure end-to-end encryption of traffic between clients and Vault. If Vault will be serving clients outside of the local Kubernetes cluster through an ingress controller, make sure the controller supports TLS (or TCP) passthrough. See the section Load balancers, ingress, and replication traffic for more information.

Vault seal configuration

The seal portion of the Vault configuration specifies the auto-unseal device responsible for protecting Vault's root key. This device, which may be a supported KMS or HSM, also provides supplemental encryption for critical security parameters (CSPs) in Vault's storage backend, and can optionally secure certain secrets engines using a feature known as Seal Wrap. Omission of the seal stanza configures Vault in Shamir mode, requiring a quorum of manually-submitted unseal keys instead. Auto-unseal is highly recommended when Vault is running in Kubernetes due to the high likelihood of Pod rescheduling.

HSM configuration will not be covered in this guide. Review the seal configuration documentation for more information.

Note

When using a cloud KMS seal device, take extra care to ensure that it is strongly protected by IAM policy and any additional, relevant security controls.

KMS Authentication

By default, Vault will attempt to use machine identity to authenticate with the cloud provider when auto-unsealing via KMS. If no identity is available, credentials must be explicitly provided via environment variables or Kubernetes secret mount. While identity mapping between cloud platform IAM solutions and Kubernetes workloads is outside the scope of this guide, examples of explicit authentication to both Google Cloud KMS and AWS KMS are provided below. There are several methods available to achieve these results; two distinct possibilities are documented here.

Google Cloud KMS

Credentials from a file can be mounted as a secret in each Vault server Pod and used for KMS authentication. In this case, a GCP credentials.json file is the source.

Create a Kubernetes secret:

$ kubectl create secret generic kms-creds --from-file=credentials.json

Mount the secret at /vault/userconfig/kms-creds/credentials.json and load the contents into the GOOGLE_APPLICATION_CREDENTIALS environment variable:

...
server:
  extraEnvironmentVars:
    GOOGLE_REGION: global
    GOOGLE_PROJECT: <PROJECT NAME>
    GOOGLE_APPLICATION_CREDENTIALS: /vault/userconfig/kms-creds/credentials.json
  extraVolumes:
    - type: "secret"
      name: "kms-creds"
...
  ha:
    config: |
      seal "gcpckms" {
        project     = "<PROJECT NAME>"
        region      = "<REGION>"
        key_ring    = "<KEYRING>"
        crypto_key  = "<KEY>"
      }
...

See Google KMS for Auto Unseal for additional information and configuration options.

AWS KMS

Credentials can be sourced from a Kubernetes secret and set as environment variables that are defined in each Vault server Pod.

Create a Kubernetes secret:

$ kubectl create secret generic eks-creds \
    --from-literal=AWS_ACCESS_KEY_ID="${AWS_ACCESS_KEY_ID}" \
    --from-literal=AWS_SECRET_ACCESS_KEY="${AWS_SECRET_ACCESS_KEY}"

Load the Kubernetes secret contents directly into the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables using the Helm chart:

...
server:
  extraSecretEnvironmentVars:
    - envName: AWS_ACCESS_KEY_ID
      secretName: eks-creds
      secretKey: AWS_ACCESS_KEY_ID
    - envName: AWS_SECRET_ACCESS_KEY
      secretName: eks-creds
      secretKey: AWS_SECRET_ACCESS_KEY
...
  ha:
    config: |
      seal "awskms" {
        region     = "<REGION>"
        kms_key_id = "<KMS_KEY_ID>"
      }
...

See AWS KMS for Auto Unseal for additional information and configuration options.

Vault storage configuration

The storage portion of the HCL configuration specifies the storage backend settings.

Note

While not the recommended pattern, an alternative to Integrated Storage is Consul. For more information, refer to the Consul Storage Backend documentation.

As we are focused on Vault with Integrated Storage, we configured the Vault HA Raft portion of the override-values.yaml file above as seen here:

storage "raft" {
  path = "/vault/data"
  retry_join {
    leader_api_addr = "https://vault-0.vault-internal:8200"
    leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
    leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
    leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
  }
  retry_join {
    leader_api_addr = "https://vault-1.vault-internal:8200"
    leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
    leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
    leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
  }
  retry_join {
    leader_api_addr = "https://vault-2.vault-internal:8200"
    leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
    leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
    leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
  }
  retry_join {
    leader_api_addr = "https://vault-3.vault-internal:8200"
    leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
    leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
    leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
  }
  retry_join {
    leader_api_addr = "https://vault-4.vault-internal:8200"
    leader_ca_cert_file = "/vault/userconfig/tls-ca/ca.crt"
    leader_client_cert_file = "/vault/userconfig/tls-server/tls.crt"
    leader_client_key_file = "/vault/userconfig/tls-server/tls.key"
  }
}

For Integrated Storage, the configuration includes the filesystem path where the Raft database will be located.

retry_join statements are used to assist Vault in locating and authenticating cluster peers on the network, the number of which reflects the desired replica count. The Helm chart assigns each StatefulSet Pod a name based on zero-indexing, e.g. <deployment name> - <index name>, which makes the addressing of Vault cluster nodes highly predictable.

While not used here, cloud auto_join can be used to locate cluster nodes without explicit or otherwise static declaration of each node's address, instead leveraging go-discover. go-discover is a Go (golang) library and command line tool to discover ip addresses of nodes in cloud environments based on meta information like tags provided by the underlying platform, and includes a Kubernetes option. However, this is often not necessary in Kubernetes due to the predictable nature of addressing, and is typically reserved for use in Vault clusters running directly on virtual machines.

Once your cluster is running, you can use the autopilot configuration API to set appropriate values for Vault autopilot. Autopilot assists with cluster lifecycle activities, reducing the need for operator intervention when adding or removing Vault cluster nodes or performing upgrades.

For more information on Integrated Storage and all available options, see the Integrated Storage documentation

Note

Certain scaling and maintenance activities in Kubernetes, such as Pod eviction, cluster autoscaling, and horizontal Pod autoscaling, can have an affect on the Pods in the cluster. If horizontal Pod autoscaling (HPA) is used, consider dynamic Helm template values or an automated process to modify the Vault configuration to reflect the new replica count, such as auto_join. Refer to the Kubernetes documentation for Horizontal Pod Autoscaling to learn more.

Additional server configuration

There are additional configuration options that can be added to the server.ha.config and server sections of the Helm chart. These settings are highly dependent on your environment:

Cluster name

If you are using Prometheus for monitoring and alerting, we recommend setting the cluster_name parameter in the Vault configuration.

Set the Vault cluster name under the config block:

server:
  affinity: ""
  standalone:
    enabled: true
    config: |
      ui = true
      cluster_name = "standalone"

      storage "file" {
        path    = "/vault/data"
      }

      listener "tcp" {
        address = "[::]:8200"
        cluster_address = "[::]:8201"
        tls_disable = "true"
      }

Vault telemetry

The telemetry portion of the HCL configuration specifies various settings needed for Vault to publish metrics to external systems.

For example, configure forwarding metrics to a statsd server as follows:

telemetry {
  statsd_address = "statsd.company.local:8125"
}

Vault supports a variety of telemetry destinations. If you need to configure Vault to publish telemetry data, refer to the telemetry configuration documentation for more detailed information.

Vault service registration

The service_registration setting applies tags to Vault Pods with their current status for use with Service selectors. Service registration is only available when Vault is running in High Availability mode.

You can add the below example to server.ha.config to automatically configure service registration via environment variables.

service_registration "kubernetes" {}

To specify a service registration namespace and Pod name:

service_registration "kubernetes" {
  namespace      = "my-namespace"
  Pod_name       = "my-Pod-name"
}

For more information, refer to the Kubernetes Service Registration documentation.

Load balancers, ingress, and replication traffic

Note

We generally recommend Layer 4 load balancing for Vault traffic to achieve maximum performance and maintain end-to-end encryption between Vault and clients.

The Vault Helm chart supports the configuration of a Kubernetes Ingress resource to route external traffic to the Vault API and UI, and standard ingress class annotations to assist with controller association.

Ingress controllers operate at the Application Layer, or OSI Layer 7, which carries two important implications in the context of Vault: idiomatic ingress implementations expect TLS termination at the controller, and there is a latency cost associated with the additional traffic inspection activities that Layer 7 load balancing employs.

Most modern ingress controllers permit at least some form of configurable traffic "passthrough" that can address the encryption concern, but performance will nearly always be negatively impacted to some degree, even if due only to service routing activities. Whether or not this incursion is acceptable is a decision that must be made based on environmental factors. Provided that TLS passthrough is enabled on your ingress controller, it is considered a safe and convenient practice for establishing connectivity with clients external to the Kubernetes cluster. Many Vault deployments achieve perfectly acceptable levels of performance and security using this approach.

Note

If Vault Enterprise replication is enabled and the peer Performance or Disaster Recovery cluster is outside of the immediate Kubernetes network, there are additional connectivity factors to consider.

Beyond exclusively targeting the primary cluster leader for "cluster" traffic (which can be achieved by routing that traffic to the vault-active service selector), Performance and Disaster Recovery secondary clusters require a gRPC HTTP/2 connection to the primary cluster. While this connection is unidirectional at any given time for a replication relationship, we strongly recommend that you configure your network to support replication role reversal by permitting bidirectional connections on the cluster interface (8201/tcp by default).

Ingress controllers do not typically support gRPC or full TCP passthrough in their default mode of operation, although many popular implementations can be configured to do so. Consult your ingress controller vendor or documentation to achieve the needed functionality to support replication connectivity, or employ an alternative Kubernetes Service type such as LoadBalancer (of the Layer 4 or "network" type) or NodePort for replication purposes.

For more information about load balancers and replication, refer to the Load Balancer Traffic Considerations section of the Monitoring Vault Replication tutorial.

IMPORTANT NOTE

Vault should never be deployed in an internet-facing or otherwise publicly-accessible manner except under special circumstances. In cloud environments, Kubernetes services with type: LoadBalancer often result in the creation of load balancers with a public IP address. Ensure that your service and ingress annotations specify the creation of a private load balancer unless you intend to expose Vault to the internet.

Vault UI

Vault features a web-based user interface, allowing operators to authenticate, unseal, create, manage, and interact with auth methods and secrets engines.

Enabling the Vault UI provides a more familiar and friendly experience to administrators who are not as comfortable working on the command line, using APIs, or who do not have alternative means of access. The UI may also prove useful as part of a self-service model for Vault consumers. The Vault Helm chart allows for configuration of the UI as an internal Service, LoadBalancer, or Ingress resource.

Viewing the Vault UI

If for security reasons you choose not to expose the UI outside of the Kubernetes cluster, it can also be exposed via port-forwarding:

$ kubectl port-forward vault-0 8200:8200
Forwarding from 127.0.0.1:8200 -> 8200
Forwarding from [::1]:8200 -> 8200
##...

Liveness/Readiness probes

Probes are essential for detecting failures and rescheduling in Kubernetes. The Helm chart offers configurable readiness and liveliness probes which can be customized for a variety of use cases.

Vault's /sys/health endpoint can be customized to change the behavior of the probe health check. For example, to change the Vault readiness probe to show the Vault Pods as "ready" even if still uninitialized and sealed, the following server.readinessProbe.path value can be applied:

server:
  readinessProbe:
    enabled: true
    path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"

Using this customized probe, a postStart script could automatically run once the Pod is ready for additional setup.

Vault Agent Injector and CSI Provider

The Helm chart can optionally be configured to install the Vault Agent Injector in addition to (or instead of) a Vault server cluster.

injector:
  enabled: true
  replicas: 3

The Agent Injector is a Kubernetes mutating admission webhook controller that alters Pod specifications to include sidecar containers based on the presence of specific annotations applied to a Kubernetes resource. An agent sidecar renders secrets to a shared memory volume by leveraging Vault's templating functionality, allowing containers within the associated Pod to consume secrets without being Vault-aware. The resource limits, image source, and replica count of the injector are all configurable parameters.

As an alternative Vault secrets consumption option, the Vault CSI provider allows Pods to consume Vault secrets using CSI Secrets Store volumes, without the involvement of sidecar containers. The provider is launched as a Kubernetes DaemonSet alongside the Kubernetes Secrets Store CSI driver DaemonSet, the latter being a prerequisite for the Vault CSI provider.

Install the Kubernetes Secret Store CSI driver.

helm repo add secrets-store-csi-driver https://kubernetes-sigs.github.io/secrets-store-csi-driver/charts
helm install csi secrets-store-csi-driver/secrets-store-csi-driver \
    --set syncSecret.enabled=true

Enable the Vault CSI provider in the Helm chart.

csi:
  enabled: true

Note

While not installed or managed by the Vault Helm chart, the Vault Secrets Operator (VSO) offers a third integration pattern, allowing Pods to consume Vault secrets natively from Kubernetes Secrets. VSO is installed using a separate, dedicated Helm chart.

Initialize and unseal Vault

After the Vault Helm chart is installed, Vault needs to be initialized and subsequently unsealed, the process for which depends on the seal configuration.

For Shamir seals, each node will need to be unsealed individually by supplying the unseal keys provided during initialization. For auto-unseal, only the leader node will be unsealed after initialization. The follower nodes will need to be cycled in order to unseal and begin servicing client traffic. From this point on, however, cluster nodes will automatically join the cluster in an unsealed and healthy state.

Operational considerations

Upgrading Vault on Kubernetes

It is important to understand the standard Vault upgrade procedure and the default upgrade behaviors of Vault before attempting to upgrade Vault in Kubernetes using the Helm chart.

The Vault StatefulSet uses the OnDelete update strategy. It is critical to use OnDelete instead of RollingUpdate because standbys must be updated before the active primary. Failover to a leader running an older version of Vault must always be avoided.

Warning

Always backup your data before upgrading by taking a snapshot. Test and validate Vault versions in a development environment prior to upgrading your production systems.

Vault does not make guarantees for backward compatibility. Simply replacing a newly-installed Vault binary with a previous version may not cleanly downgrade Vault, as upgrades can apply breaking changes to the underlying data storage structure. If you need to roll back to a previous version of Vault, you must revert the data storage to the previous state as well by restoring from a snapshot.

Note

Vault snapshots are the only supported method for backing up Vault with Integrated Storage. Restoring a Vault cluster from disparate disk backups or snapshots can introduce data consistency errors and should never be attempted.

Upgrading Vault servers

Helm will install the latest chart found in a repo by default. Always specify the chart version when upgrading and pin component versions as needed for your installation.

To initiate the upgrade, set the server.image values to the desired Vault version, either in a values YAML file or on the command line. For illustrative purposes, the example below uses an image tag of hashicorp/vault:123.456.

server:
  image:
    repository: hashicorp/vault
    tag: 123.456

List the Helm versions and choose the desired version to install.

$ helm search repo hashicorp/vault
NAME            CHART VERSION APP VERSION DESCRIPTION
hashicorp/vault 0.29.0         1.16.1      Official HashiCorp Vault Chart

Test the upgrade with --dry-run first to verify the changes sent to the Kubernetes cluster.

$ helm upgrade vault hashicorp/vault --version=0.29.0 \
    --set="server.image.repository=hashicorp/vault" \
    --set="server.image.tag=123.456" \
    --dry-run

This may cause no immediate changes to the running Vault cluster, however the resource definitions have been modified. The Pods can now be manually deleted to continue with the upgrade process. Deleting the Pods does not delete any of your data as each new Pod will attach to its corresponding PVC, which remains unmodified.

If Vault is not deployed using HA mode, the single Vault server may be deleted.

$ kubectl delete Pod <name of Vault Pod>

If Vault is deployed using HA mode, the standby Pods must be upgraded first. Use the selector vault-active=false to delete all non-active primary vault Pods.

$ kubectl delete pod --selector="vault-active=false"

If auto-unseal is not being used, the newly scheduled Vault standby Pods will need to be unsealed.

$ kubectl exec --stdin=true --tty=true <name of Pod> -- vault operator unseal

Finally, once the standby nodes have been updated and unsealed, delete the active primary Pod.

$ kubectl delete pod <name of Vault primary>

Just like the standby nodes, the former primary Pod now also needs to be unsealed if using Shamir.

$ kubectl exec --stdin=true --tty=true <name of Pod> -- vault operator unseal

After a few moments the Vault cluster should elect a new active primary. The Vault cluster should now be upgraded.

Architecture

We maintain the same architectural recommendations when running Vault in Kubernetes as we would for any other platform. The production hardening documentation is critical to read, understand, and adopt in all cases.

Production deployment checklist

End-to-End TLS: Vault should always be used with TLS in production. If intermediate load balancers or reverse proxies are used to front Vault, they should not terminate TLS. This way, traffic is always encrypted in transit to Vault, minimizing risks introduced by intermediate layers. See the official documentation for additional examples of TLS configuration with Vault.
Single Tenancy: Vault should be the only main process running on a machine. This reduces the risk that another process running on the same machine is compromised and can interact with Vault. This can be accomplished by using affinity configurations. See the official documentation for an example of configuring Vault to use affinity rules.
Enable Auditing: Vault supports several auditing backends. Enabling auditing provides a history of all operations performed by Vault and a forensic trail in the case of misuse or compromise. Audit logs securely hash sensitive data, but access should still be restricted to prevent any unintended disclosures. Vault Helm includes a configurable auditStorage option that provisions a persistent volume to persist audit logs. See the official documentation for an example of configuring Vault to use auditing.
Immutable Upgrades: Vault relies on an external storage backend for persistence, and this decoupling allows the servers running Vault to be managed immutably. When upgrading to new versions, new servers with the upgraded version of Vault are brought online. They are attached to the same storage, unsealed, and the old servers are destroyed. This reduces the need for remote access and upgrade orchestration which may introduce security gaps. See the upgrade section for guidance.
Upgrade Frequently: Vault is actively developed, and updating frequently is important in order to to incorporate security fixes and any changes to default settings such as key lengths or cipher suites. Subscribe to the Vault mailing list and GitHub CHANGELOG for updates.
Restrict Storage Access: Vault encrypts all data at rest, regardless of which storage backend is used. Although the data is encrypted, an attacker with arbitrary control can cause data corruption or loss. In the worst case, a threat actor could decrypt the backend storage if the auto-unseal device is compromised and the storage is directly accessible. Access to the storage backend should be strictly limited to avoid unauthorized access or operations, especially in multi-tenancy environments.

Note

See Vault on Kubernetes Security Considerations for more information.

Collection Overview

Integrate and manage Vault

Vault on Kubernetes security considerations