• HashiCorp Developer

  • HashiCorp Cloud Platform
  • Terraform
  • Packer
  • Consul
  • Vault
  • Boundary
  • Nomad
  • Waypoint
  • Vagrant
Vault
  • Install
  • Tutorials
  • Documentation
  • API
  • Integrations
  • Try Cloud(opens in new tab)
  • Sign up
Vault Home

Documentation

Skip to main content
  • Documentation
  • What is Vault?
  • Use Cases

  • Browser Support
  • Installing Vault
    • Overview
    • Architecture
    • High Availability
    • Integrated Storage
    • Security Model
    • Telemetry
    • Token Authentication
    • Key Rotation
    • Replication
    • Limits and Maximums

  • Vault Integration Program
  • Vault Interoperability Matrix
  • Troubleshoot






  • Glossary


  • Resources

  • Tutorial Library
  • Certifications
  • Community Forum
    (opens in new tab)
  • Support
    (opens in new tab)
  • GitHub
    (opens in new tab)
  1. Developer
  2. Vault
  3. Documentation
  4. Internals
  5. Telemetry
  • Vault
  • v1.11.x
  • v1.10.x
  • v1.9.x
  • v1.8.x
  • v1.7.x
  • v1.6.x
  • v1.5.x
  • v1.4.x

ยปTelemetry

The Vault server process collects various runtime metrics about the performance of different libraries and subsystems. These metrics are aggregated on a ten-second interval and retained for one minute in memory. Telemetry from Vault must be stored in metrics aggregation software to monitor Vault and collect durable metrics.

To view the raw data, you must send a signal to the Vault process: on Unix-style operating systems, this is USR1, while on Windows, it is BREAK. When the Vault process receives this signal, it will dump the current telemetry information to the process's stderr.

This telemetry information can be used for debugging purposes and provides users with insights into Vault's runtime.

Telemetry information can also be streamed directly from Vault to a range of metrics aggregation solutions as described in the telemetry Stanza documentation.

The following is an example of a telemetry dump snippet:

[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.expire.num_leases': 5100.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.num_goroutines': 39.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.sys_bytes': 222746880.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.malloc_count': 109189192.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.free_count': 108408240.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.heap_objects': 780953.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_runs': 232.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.alloc_bytes': 72954392.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_pause_ns': 150293024.000
[2017-12-19 20:37:50 +0000 UTC][S] 'vault.merkle.flushDirty': Count: 100 Min: 0.008 Mean: 0.027 Max: 0.183 Stddev: 0.024 Sum: 2.681 LastUpdated: 2017-12-19 20:37:59.848733035 +0000 UTC m=+10463.692105920
[2017-12-19 20:37:50 +0000 UTC][S] 'vault.merkle.saveCheckpoint': Count: 4 Min: 0.021 Mean: 0.054 Max: 0.110 Stddev: 0.039 Sum: 0.217 LastUpdated: 2017-12-19 20:37:57.048458148 +0000 UTC m=+10460.891835029
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.alloc_bytes': 73326136.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.sys_bytes': 222746880.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.malloc_count': 109195904.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.free_count': 108409568.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.heap_objects': 786342.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_pause_ns': 150293024.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.expire.num_leases': 5100.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.num_goroutines': 39.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_runs': 232.000
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.route.rollback.consul-': Count: 1 Sum: 0.013 LastUpdated: 2017-12-19 20:38:01.968471579 +0000 UTC m=+10465.811842067
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.consul-': Count: 1 Sum: 0.073 LastUpdated: 2017-12-19 20:38:01.968502743 +0000 UTC m=+10465.811873131
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.pki-': Count: 1 Sum: 0.070 LastUpdated: 2017-12-19 20:38:01.96867005 +0000 UTC m=+10465.812041936
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.route.rollback.auth-app-id-': Count: 1 Sum: 0.012 LastUpdated: 2017-12-19 20:38:01.969146401 +0000 UTC m=+10465.812516689
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.identity-': Count: 1 Sum: 0.063 LastUpdated: 2017-12-19 20:38:01.968029888 +0000 UTC m=+10465.811400276
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.database-': Count: 1 Sum: 0.066 LastUpdated: 2017-12-19 20:38:01.969394215 +0000 UTC m=+10465.812764603
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.barrier.get': Count: 16 Min: 0.010 Mean: 0.015 Max: 0.031 Stddev: 0.005 Sum: 0.237 LastUpdated: 2017-12-19 20:38:01.983268118 +0000 UTC m=+10465.826637008
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.merkle.flushDirty': Count: 100 Min: 0.006 Mean: 0.024 Max: 0.098 Stddev: 0.019 Sum: 2.386 LastUpdated: 2017-12-19 20:38:09.848158309 +0000 UTC m=+10473.691527099

You'll note that log entries are prefixed with the metric type as follows:

  • [C] is a counter. Counters are cumulative metrics that are incremented when some event occurs, and resets at the end of reporting intervals. Vault retains counters and other metrics for one minute in-memory, so an aggregation solution must be configured to see accurate and persistent counters over time.
  • [G] is a gauge. Gauges provide measurements of current values.
  • [S] is a summary. Summaries provide sample observations of values. Vault commonly uses summaries for measuring the timing duration of discrete events in the reporting interval.

The following sections describe the available Vault metrics. The metrics interval are approximately 10 seconds when manually triggering metrics output using the above-described signals. Some high-cardinality gauges, like vault.kv.secret.count, are emitted every 10 minutes, or at an interval configured in the telemetry stanza.

Some Vault metrics come with additional labels describing the measurement in more detail, such as the namespace in which an operation takes place or the auth method used to create a token. This additional information is incorporated into the metrics name in the in-memory telemetry or other telemetry engines that do not support labels. The metric name in the table below is followed by a list of labels supported, in the order in which they appear, if flattened.

Audit Metrics

These metrics relate to auditing.

MetricDescriptionUnitType
vault.audit.log_requestDuration of time taken by all audit log requests across all audit log devicesmssummary
vault.audit.log_responseDuration of time taken by audit log responses across all audit log devicesmssummary
vault.audit.log_request_failureNumber of audit log request failures. NOTE: This is a crucial metric. A non-zero value here indicates that there was a failure to send an audit log request to a configured audit log devices occured. If Vault cannot log into a configured audit log device, it ceases all user operations. When this metric increases regularly, it is suggested to troubleshoot the audit log devices immediately.failurescounter
vault.audit.log_response_failureNumber of audit log response failures. NOTE: This is a crucial metric. A non-zero value here indicates that there was a failure to receive a response to a request made to one of the configured audit log devices occured. When Vault cannot log to a configured audit log devices, it ceases all user operations. Troubleshooting the audit log devices is suggested when a consistent value of this metric is evaluated.failurescounter

NOTE: In addition, there are audit metrics for each enabled audit device represented as vault.audit.<type>.log_request. For example, if a file audit device is enabled, its metrics would be vault.audit.file.log_request and vault.audit.file.log_response .

Core Metrics

These metrics represent operational aspects of the running Vault instance.

MetricDescriptionUnitType
vault.barrier.deleteDuration of time taken by DELETE operations at the barriermssummary
vault.barrier.getDuration of time taken by GET operations at the barriermssummary
vault.barrier.putDuration of time taken by PUT operations at the barriermssummary
vault.barrier.listDuration of time taken by LIST operations at the barriermssummary
vault.cache.hitNumber of times a value was retrieved from the LRU cache.cache hitcounter
vault.cache.missNumber of times a value was not in the LRU cache. The results in a read from the configured storage.cache misscounter
vault.cache.writeNumber of times a value was written to the LRU cache.cache writecounter
vault.cache.deleteNumber of times a value was deleted from the LRU cache. This does not count cache expirations.cache deletecounter
vault.core.activeHas a value 1 when the vault node is active, and 0 when node is in standby.boolgauge
vault.core.activity.fragment_sizeNumber of entities or tokens (depending on the "type" label) observed by the local node.tokenscounter
vault.core.activity.segment_writeDuration of time taken writing activity log segments to storage.mssummary
vault.core.check_tokenDuration of time taken by token checks handled by Vault coremssummary
vault.core.fetch_acl_and_tokenDuration of time taken by ACL and corresponding token entry fetches handled by Vault coremssummary
vault.core.handle_requestDuration of time taken by non-login requests handled by Vault coremssummary
vault.core.handle_login_requestDuration of time taken by login requests handled by Vault coremssummary
vault.core.in_flight_requestsNumber of in-flight requests.requestsgauge
vault.core.leadership_setup_failedDuration of time taken by cluster leadership setup failures which have occurred in a highly available Vault cluster. This should be monitored and alerted on for overall cluster leadership status.mssummary
vault.core.leadership_lostThe total duration that a HA cluster node maintained leadership as reported at the last time of loss. Any count greater than zero means that a leadership change has occurred. Continuing changes or reports of low value could be a cause for monitoring alerts as they would typically imply ongoing flapping of leadership that may rotate between nodes.mssummary
vault.core.license.expiration_time_epochTime as epoch (seconds since Jan 1 1970) at which license will expire.secondsgauge
vault.core.mount_table.num_entriesNumber of mounts in a particular mount table. This metric is labeled by table type (auth or logical) and whether or not the table is replicated (local or not)objectsgauge
vault.core.mount_table.sizeSize of a particular mount table. This metric is labeled by table type (auth or logical) and whether or not the table is replicated (local or not)bytesgauge
vault.core.post_unsealDuration of time taken by post-unseal operations handled by Vault coremssummary
vault.core.pre_sealDuration of time taken by pre-seal operationsmssummary
vault.core.seal-with-requestDuration of time taken by requested seal operationsmssummary
vault.core.sealDuration of time taken by seal operationsmssummary
vault.core.seal-internalDuration of time taken by internal seal operationsmssummary
vault.core.step_downDuration of time taken by cluster leadership step downs. This should be monitored, and alerts set for overall cluster leadership status.mssummary
vault.core.unsealDuration of time taken by unseal operationsmssummary
vault.core.unsealedHas a value 1 when Vault is unsealed, and 0 when Vault is sealed.boolgauge
vault.metrics.collection (cluster,gauge)Time taken to collect usage gauges, labeled by gauge type.summary
vault.metrics.collection.interval (cluster,gauge)The current value of usage gauge collection interval.summary
vault.metrics.collection.error (cluster,gauge)Errors while collection usage gauges, labeled by gauge type.counter
vault.rollback.attempt.<mountpoint>Time taken to perform a rollback operation on the given mount point. The mount point name has its forward slashes / replaced by -. For example, a rollback operation on the auth/token backend is reported as vault.rollback.attempt.auth-token-.mssummary
vault.route.create.<mountpoint>Time taken to dispatch a create operation to a backend, and for that backend to process it. The mount point name has its forward slashes / replaced by -. For example, a create operation to ns1/secret/ would have corresponding metric vault.route.create.ns1-secret-. The number of samples of this metric, and the corresponding ones for other operations below, indicates how many operations were performed per mount point.mssummary
vault.route.delete.<mountpoint>Time taken to dispatch a delete operation to a backend, and for that backend to process it.mssummary
vault.route.list.<mountpoint>Time taken to dispatch a list operation to a backend, and for that backend to process it.mssummary
vault.route.read.<mountpoint>Time taken to dispatch a read operation to a backend, and for that backend to process it.mssummary
vault.route.rollback.<mountpoint>Time taken to dispatch a rollback operation to a backend, and for that backend to process it. Rollback operations are automatically scheduled to clean up partial errors.mssummary

Runtime Metrics

These metrics collect information from Vault's Go runtime, such as memory usage information.

MetricDescriptionUnitType
vault.runtime.alloc_bytesNumber of bytes allocated by the Vault process. The number of bytes may peak from time to time, but should return to a steady state value.bytesgauge
vault.runtime.free_countNumber of freed objectsobjectsgauge
vault.runtime.heap_objectsNumber of objects on the heap. This is a good general memory pressure indicator worth establishing a baseline and thresholds for alerting.objectsgauge
vault.runtime.malloc_countCumulative count of allocated heap objectsobjectsgauge
vault.runtime.num_goroutinesNumber of go routines. This serves as a general system load indicator worth establishing a baseline and thresholds for alerting.go routinesgauge
vault.runtime.sys_bytesNumber of bytes allocated to Vault. This includes what is being used by Vault's heap and what has been reclaimed but not given back to the operating system.bytesgauge
vault.runtime.total_gc_pause_nsThe total garbage collector pause time since Vault was last startednsgauge
vault.runtime.gc_pause_nsTotal duration of the last garbage collection runnssummary
vault.runtime.total_gc_runsTotal number of garbage collection runs since Vault was last startedoperationsgauge

Policy Metrics

These metrics report measurements of the time spent performing policy operations.

MetricDescriptionUnitType
vault.policy.get_policyTime taken to get a policymssummary
vault.policy.list_policiesTime taken to list policiesmssummary
vault.policy.delete_policyTime taken to delete a policymssummary
vault.policy.set_policyTime taken to set a policymssummary

Token, Identity, and Lease Metrics

These metrics cover the measurement of token, identity, and lease operations, and counts of the number of such objects managed by Vault.

MetricDescriptionUnitType
vault.expire.fetch-lease-timesTime taken to retrieve lease timesmssummary
vault.expire.fetch-lease-times-by-tokenTime taken to retrieve lease times by tokenmssummary
vault.expire.num_leasesNumber of all leases which are eligible for eventual expiryleasesgauge
vault.expire.num_irrevocable_leasesNumber of leases that cannot be revoked automaticallyleasesgauge
vault.expire.leases.by_expiration (cluster,gauge,expiring,namespace)The number of leases set to expire, grouped by a time interval. This specific time interval and the total number of time intervals are configurable via lease_metrics_epsilon and num_lease_metrics_buckets in the telemetry stanza of a vault server configuration. The default values for these are 1hr and 168 respectively, so the metric will report the number of leases that will expire each hour from the current time to a week from the present time. You can additionally group lease expiration by namespace by setting add_lease_metrics_namespace_labels to true in the config file (default is false).leasesgauge
vault.expire.job_manager.total_jobsTotal pending revocation jobsleasessummary
vault.expire.job_manager.queue_lengthTotal pending revocation jobs by auth methodleasessummary
vault.expire.lease_expirationCount of lease expirationsleasescounter
vault.expire.lease_expiration.time_in_queueTime taken for lease to get to the front of the revoke queuemssummary
vault.expire.lease_expiration.errorCount of lease expiration errorserrorscounter
vault.expire.revokeTime taken to revoke a tokenmssummary
vault.expire.revoke-forceTime taken to revoke a token forciblymssummary
vault.expire.revoke-prefixTime taken to revoke tokens on a prefixmssummary
vault.expire.revoke-by-tokenTime taken to revoke all secrets issued with a given tokenmssummary
vault.expire.renewTime taken to renew a leasemssummary
vault.expire.renew-tokenTime taken to renew a token which does not need to invoke a logical backendmssummary
vault.expire.registerTime taken for register operationsmssummary
vault.expire.register-authTime taken for register authentication operations which create lease entries without lease IDmssummary
vault.identity.num_entitiesThe number of identity entities stored in Vaultentitiesgauge
vault.identity.entity.active.monthly (cluster, namespace)The number of distinct entities that created a token during the past month, per namespace. Only available if client count is enabled. Reported at the start of each month.entitiesgauge
vault.identity.entity.active.partial_month (cluster)The total number of distinct entities that has created a token during the current month. Only available if client count is enabled. Reported periodically within each month.entitiesgauge
vault.identity.entity.active.reporting_period (cluster, namespace)The client count default reporting period defines the number of distinct entities that created a token in the past N months, as defined by the client count default reporting period. Only available if client count is enabled. Reported at the start of each month.entitiesgauge
vault.identity.entity.alias.count (cluster, namespace, auth_method, mount_point)The number of identity entities aliases stored in Vault, grouped by the auth mount that created them. This gauge is computed every 10 minutes.aliasesgauge
vault.identity.entity.count (cluster, namespace)The number of identity entities stored in Vault, grouped by namespace.entitiesgauge
vault.identity.entity.creation (cluster, namespace, auth_method, mount_point)The number of identity entities created, grouped by the auth mount that created them.entitiescounter
vault.identity.upsert_entity_txnTime taken to insert a new or modified entity into the in-memory database, and persist it to storage.mssummary
vault.identity.upsert_group_txnTime taken to insert a new or modified group into the in-memory database, and persist it to storage. This operation is performed on group membership changes.mssummary
vault.token.count (cluster, namespace)Number of service tokens available for use; counts all un-expired and un-revoked tokens in Vault's token store. This measurement is performed every 10 minutes.tokengauge
vault.token.count.by_auth (cluster, namespace, auth_method)Number of service tokens that were created by a particular auth method.tokensgauge
vault.token.count.by_policy (cluster, namespace, policy)Number of service tokens that have a particular policy attached. If a token has more than one policy, it is counted in each policy gauge.tokensgauge
vault.token.count.by_ttl (cluster, namespace, creation_ttl)Number of service tokens, grouped by the TTL range they were assigned at creation.tokensgauge
vault.token.createThe time taken to create a tokenmssummary
vault.token.create_rootNumber of created root tokens. Does not decrease on revocation.tokenscounter
vault.token.createAccessorThe time taken to create a token accessormssummary
vault.token.creation (cluster, namespace, auth_method, mount_point, creation_ttl, token_type)Number of service or batch tokens created.tokenscounter
vault.token.lookupThe time taken to look up a tokenmssummary
vault.token.revokeTime taken to revoke a tokenmssummary
vault.token.revoke-treeTime taken to revoke a token treemssummary
vault.token.storeTime taken to store an updated token entry without writing to the secondary indexmssummary

Resource Quota Metrics

These metrics relate to rate limit and lease count quotas. Each metric comes with a label "name" identifying the specific quota.

MetricDescriptionUnitType
vault.quota.rate_limit.violationTotal number of rate limit quota violationsquotacounter
vault.quota.lease_count.violationTotal number of lease count quota violationsquotacounter
vault.quota.lease_count.maxTotal maximum number of leases allowed by the lease count quotaleasegauge
vault.quota.lease_count.counterTotal current number of leases generated by the lease count quotaleasegauge

Merkle Tree and Write Ahead Log Metrics

These metrics relate to internal operations on Merkle Trees and Write Ahead Logs (WAL)

MetricDescriptionUnitType
vault.merkle.flushDirtyTime taken to flush any dirty pages to cold storagemssummary
vault.merkle.flushDirty.num_pagesNumber of pages flushedpagesgauge
vault.merkle.flushDirty.outstanding_pagesNumber of pages that were not flushedpagesgauge
vault.merkle.saveCheckpointTime taken to save the checkpointmssummary
vault.merkle.saveCheckpoint.num_dirtyNumber of dirty pages at checkpointpagesgauge
vault.wal.deleteWALsTime taken to delete a Write Ahead Log (WAL)mssummary
vault.wal.gc.deletedNumber of Write Ahead Logs (WAL) deleted during each garbage collection runWALgauge
vault.wal.gc.totalTotal Number of Write Ahead Logs (WAL) on diskWALgauge
vault.wal.loadWALTime taken to load a Write Ahead Log (WAL)mssummary
vault.wal.persistWALsTime taken to persist a Write Ahead Log (WAL)mssummary
vault.wal.flushReadyTime taken to flush a ready Write Ahead Log (WAL) to storagemssummary
vault.wal.flushReady.queue_lenSize of the write queue in the WAL systemWALsummary

HA Metrics

These metrics are emitted on standbys when talking to the active node, and in some cases by performance standbys as well.

MetricDescriptionUnitType
vault.ha.rpc.client.forwardTime taken to forward a request from a standby to the active nodemssummary
vault.ha.rpc.client.forward.errorsNumber of standby requests forwarding failureserrorscounter
vault.ha.rpc.client.echoTime taken to send an echo request from a standby to the active nodemssummary
vault.ha.rpc.client.echo.errorsNumber of standby echo request failureserrorscounter

Replication Metrics

These metrics relate to Vault Enterprise Replication. The following metrics are not available in telemetry unless replication is in an unhealthy state: replication.fetchRemoteKeys, replication.merkleDiff, and replication.merkleSync.

MetricDescriptionUnitType
vault.core.replication.performance.primarySet to 1 if this is a performance primary, 0 if notbooleangauge
vault.core.replication.performance.secondarySet to 1 if this is a performance secondary, 0 if notbooleangauge
vault.core.replication.dr.primarySet to 1 if this is a DR primary, 0 if notbooleangauge
vault.core.replication.dr.secondarySet to 1 if this is a DR secondary, 0 if notbooleangauge
vault.core.performance_standbySet to 1 if this is a performance standby, 0 if notbooleangauge
vault.logshipper.streamWALs.missing_guardNumber of incidences where the starting Merkle Tree index used to begin streaming WAL entries is not matched/foundmissing guardscounter
vault.logshipper.streamWALs.guard_foundNumber of incidences where the starting Merkle Tree index used to begin streaming WAL entries is matched/foundfound guardscounter
vault.logshipper.streamWALs.scanned_entriesNumber of entries scanned in the buffer before the right one was found.scanned entriessummary
vault.logshipper.buffer.lengthCurrent length of the log shipper bufferbuffer entriesgauge
vault.logshipper.buffer.sizeCurrent size in bytes of the log shipper bufferbytesgauge
vault.logshipper.buffer.max_lengthMaximum length of the log shipper bufferbuffer entriesgauge
vault.logshipper.buffer.max_sizeMaximum size in bytes of the log shipper bufferbytesgauge
vault.replication.fetchRemoteKeysTime taken to fetch keys from a remote cluster participating in replication prior to Merkle Tree based delta generationmssummary
vault.replication.merkleDiffTime taken to perform a Merkle Tree based delta generation between the clusters participating in replicationmssummary
vault.replication.merkleSyncTime taken to perform a Merkle Tree based synchronization using the last delta generated between the clusters participating in replicationmssummary
vault.replication.merkle.commit_indexThe last committed index in the Merkle Tree.sequence numbergauge
vault.replication.wal.last_walThe index of the last WALsequence numbergauge
vault.replication.wal.last_dr_walThe index of the last DR WALsequence numbergauge
vault.replication.wal.last_performance_walThe index of the last Performance WALsequence numbergauge
vault.replication.fsm.last_remote_walThe index of the last remote WALsequence numbergauge
vault.replication.wal.gcTime taken to complete one run of the WAL garbage collection processmssummary
vault.replication.rpc.server.auth_requestDuration of time taken by auth requestmssummary
vault.replication.rpc.server.bootstrap_requestDuration of time taken by bootstrap requestmssummary
vault.replication.rpc.server.conflicting_pages_requestDuration of time taken by conflicting pages requestmssummary
vault.replication.rpc.server.echoDuration of time taken by echomssummary
vault.replication.rpc.server.save_mfa_response_authDuration of time taken by saving MFA auth responsemssummary
vault.replication.rpc.server.forwarding_requestDuration of time taken by forwarding requestmssummary
vault.replication.rpc.server.guard_hash_requestDuration of time taken by guard hash requestmssummary
vault.replication.rpc.server.persist_alias_requestDuration of time taken by persist alias requestmssummary
vault.replication.rpc.server.persist_persona_requestDuration of time taken by persist persona requestmssummary
vault.replication.rpc.server.stream_wals_requestDuration of time taken by stream wals requestmssummary
vault.replication.rpc.server.sub_page_hashes_requestDuration of time taken by sub page hashes requestmssummary
vault.replication.rpc.server.sync_counter_requestDuration of time taken by sync counter requestmssummary
vault.replication.rpc.server.upsert_group_requestDuration of time taken by upsert group requestmssummary
vault.replication.rpc.client.conflicting_pagesDuration of time taken by client conflicting pages requestmssummary
vault.replication.rpc.client.fetch_keysDuration of time taken by client fetch keys requestmssummary
vault.replication.rpc.client.forwardDuration of time taken by client forward requestmssummary
vault.replication.rpc.client.guard_hashDuration of time taken by client guard hash requestmssummary
vault.replication.rpc.client.persist_aliasDuration of time taken bymssummary
vault.replication.rpc.client.register_authDuration of time taken by client register auth requestmssummary
vault.replication.rpc.client.register_leaseDuration of time taken by client register lease requestmssummary
vault.replication.rpc.client.stream_walsDuration of time taken by client smssummary
vault.replication.rpc.client.sub_page_hashesDuration of time taken by client sub page hashes requestmssummary
vault.replication.rpc.client.sync_counterDuration of time taken by client sync counter requestmssummary
vault.replication.rpc.client.upsert_groupDuration of time taken by client upstert group requestmssummary
vault.replication.rpc.client.wrap_in_cubbyholeDuration of time taken by client wrap in cubbyhole requestmssummary
vault.replication.rpc.client.save_mfa_response_authDuration of time taken by client saving MFA auth responsemssummary
vault.replication.rpc.dr.server.echoDuration of time taken by DR echo requestmssummary
vault.replication.rpc.dr.server.fetch_keys_requestDuration of time taken by DR fetch keys requestmssummary
vault.replication.rpc.standby.server.echoDuration of time taken by standby echo requestmssummary
vault.replication.rpc.standby.server.register_auth_requestDuration of time taken by standby register auth requestmssummary
vault.replication.rpc.standby.server.register_lease_requestDuration of time taken by standby register lease requestmssummary
vault.replication.rpc.standby.server.wrap_token_requestDuration of time taken by standby wrap token requestmssummary

Secrets Engines Metrics

These metrics relate to the supported secrets engines.

MetricDescriptionUnitType
database.InitializeTime taken to initialize a database secret engine across all database secrets enginesmssummary
database.<name>.InitializeTime taken to initialize a database secret engine for the named database secrets engine <name>, for example: database.postgresql-prod.Initializemssummary
database.Initialize.errorNumber of database secrets engine initialization operation errors across all database secrets engineserrorscounter
database.<name>.Initialize.errorNumber of database secrets engine initialization operation errors for the named database secrets engine <name>, for example: database.postgresql-prod.Initialize.errorerrorscounter
database.CloseTime taken to close a database secret engine across all database secrets enginesmssummary
database.<name>.CloseTime taken to close a database secret engine for the named database secrets engine <name>, for example: database.postgresql-prod.Closemssummary
database.Close.errorNumber of database secrets engine close operation errors across all database secrets engineserrorscounter
database.<name>.Close.errorNumber of database secrets engine close operation errors for the named database secrets engine <name>, for example: database.postgresql-prod.Close.errorerrorscounter
database.CreateUserTime taken to create a user across all database secrets enginesmssummary
database.<name>.CreateUserTime taken to create a user for the named database secrets engine <name>mssummary
database.CreateUser.errorNumber of user creation operation errors across all database secrets engineserrorscounter
database.<name>.CreateUser.errorNumber of user creation operation errors for the named database secrets engine <name>, for example: database.postgresql-prod.CreateUser.errorerrorscounter
database.RenewUserTime taken to renew a user across all database secrets enginesmssummary
database.<name>.RenewUserTime taken to renew a user for the named database secrets engine <name>, for example: database.postgresql-prod.RenewUsermssummary
database.RenewUser.errorNumber of user renewal operation errors across all database secrets engineserrorscounter
database.<name>.RenewUser.errorNumber of user renewal operations for the named database secrets engine <name>, for example: database.postgresql-prod.RenewUser.errorerrorscounter
database.RevokeUserTime taken to revoke a user across all database secrets enginesmssummary
database.<name>.RevokeUserTime taken to revoke a user for the named database secrets engine <name>, for example: database.postgresql-prod.RevokeUsermssummary
database.RevokeUser.errorNumber of user revocation operation errors across all database secrets engineserrorscounter
database.<name>.RevokeUser.errorNumber of user revocation operations for the named database secrets engine <name>, for example: database.postgresql-prod.RevokeUser.errorerrorscounter
secrets.pki.tidy.cert_store_current_entryThe index of the current entry in the certificate store being verified by the tidy operationentry indexgauge
secrets.pki.tidy.cert_store_deleted_countNumber of entries deleted from the certificate storeentrycounter
secrets.pki.tidy.cert_store_total_entriesNumber of entries in the certificate store to verify during the tidy operationentrygauge
secrets.pki.tidy.cert_store_total_entries_remainingNumber of entries in the certificate store that are left after the tidy operation (checked but not removed).entrygauge
secrets.pki.tidy.durationDuration of time taken by the PKI tidy operationmssummary
secrets.pki.tidy.failureNumber of times the PKI tidy operation has not completed due to errorsoperationscounter
secrets.pki.tidy.revoked_cert_current_entryThe index of the current revoked certificate entry in the certificate store being verified by the tidy operationentry indexgauge
secrets.pki.tidy.revoked_cert_deleted_countNumber of entries deleted from the certificate store for revoked certificatesentrycounter
secrets.pki.tidy.revoked_cert_total_entriesNumber of entries in the certificate store for revoked certificates to verify during the tidy operationentrygauge
secrets.pki.tidy.revoked_cert_total_entries_remainingNumber of entries in the certificate store for revoked certificates that are left after the tidy operation (checked but not removed).entrygauge
secrets.pki.tidy.revoked_cert_total_entries_incorrect_issuersNumber of entries in the certificate store which had incorrect issuer information (total).entrygauge
secrets.pki.tidy.revoked_cert_total_entries_fixed_issuersNumber of entries in the certificate store which had incorrect issuer information that was fixed during this tidy operation.entrygauge
secrets.pki.tidy.start_time_epochStart time (as seconds since Jan 1 1970) when the PKI tidy operation is active, 0 otherwisesecondsgauge
secrets.pki.tidy.successNumber of times the PKI tidy operation has been completed successfullyoperationscounter
vault.secret.kv.count (cluster, namespace, mount_point)Number of entries in each key-value secret engine.pathsgauge
vault.secret.lease.creation (cluster, namespace, secret_engine, mount_point, creation_ttl)Counts the number of leases created by secret engines.leasescounter

Storage Backend Metrics

These metrics relate to the supported storage backends.

MetricDescriptionUnitType
vault.azure.putDuration of a PUT operation against the Azure storage backendmssummary
vault.azure.getDuration of a GET operation against the Azure storage backendmssummary
vault.azure.deleteDuration of a DELETE operation against the Azure storage backendmssummary
vault.azure.listDuration of a LIST operation against the Azure storage backendmssummary
vault.cassandra.putDuration of a PUT operation against the Cassandra storage backendmssummary
vault.cassandra.getDuration of a GET operation against the Cassandra storage backendmssummary
vault.cassandra.deleteDuration of a DELETE operation against the Cassandra storage backendmssummary
vault.cassandra.listDuration of a LIST operation against the Cassandra storage backendmssummary
vault.cockroachdb.putDuration of a PUT operation against the CockroachDB storage backendmssummary
vault.cockroachdb.getDuration of a GET operation against the CockroachDB storage backendmssummary
vault.cockroachdb.deleteDuration of a DELETE operation against the CockroachDB storage backendmssummary
vault.cockroachdb.listDuration of a LIST operation against the CockroachDB storage backendmssummary
vault.consul.putDuration of a PUT operation against the Consul storage backendmssummary
vault.consul.transactionDuration of a Txn operation against the Consul storage backendmssummary
vault.consul.getDuration of a GET operation against the Consul storage backendmssummary
vault.consul.deleteDuration of a DELETE operation against the Consul storage backendmssummary
vault.consul.listDuration of a LIST operation against the Consul storage backendmssummary
vault.couchdb.putDuration of a PUT operation against the CouchDB storage backendmssummary
vault.couchdb.getDuration of a GET operation against the CouchDB storage backendmssummary
vault.couchdb.deleteDuration of a DELETE operation against the CouchDB storage backendmssummary
vault.couchdb.listDuration of a LIST operation against the CouchDB storage backendmssummary
vault.dynamodb.putDuration of a PUT operation against the DynamoDB storage backendmssummary
vault.dynamodb.getDuration of a GET operation against the DynamoDB storage backendmssummary
vault.dynamodb.deleteDuration of a DELETE operation against the DynamoDB storage backendmssummary
vault.dynamodb.listDuration of a LIST operation against the DynamoDB storage backendmssummary
vault.etcd.putDuration of a PUT operation against the etcd storage backendmssummary
vault.etcd.getDuration of a GET operation against the etcd storage backendmssummary
vault.etcd.deleteDuration of a DELETE operation against the etcd storage backendmssummary
vault.etcd.listDuration of a LIST operation against the etcd storage backendmssummary
vault.gcs.putDuration of a PUT operation against the Google Cloud Storage storage backendmssummary
vault.gcs.getDuration of a GET operation against the Google Cloud Storage storage backendmssummary
vault.gcs.deleteDuration of a DELETE operation against the Google Cloud Storage storage backendmssummary
vault.gcs.listDuration of a LIST operation against the Google Cloud Storage storage backendmssummary
vault.gcs.lock.unlockDuration of an UNLOCK operation against the Google Cloud Storage storage backend in HA modemssummary
vault.gcs.lock.lockDuration of a LOCK operation against the Google Cloud Storage storage backend in HA modemssummary
vault.gcs.lock.valueDuration of a VALUE operation against the Google Cloud Storage storage backend in HA modemssummary
vault.mssql.putDuration of a PUT operation against the MS-SQL storage backendmssummary
vault.mssql.getDuration of a GET operation against the MS-SQL storage backendmssummary
vault.mssql.deleteDuration of a DELETE operation against the MS-SQL storage backendmssummary
vault.mssql.listDuration of a LIST operation against the MS-SQL storage backendmssummary
vault.mysql.putDuration of a PUT operation against the MySQL storage backendmssummary
vault.mysql.getDuration of a GET operation against the MySQL storage backendmssummary
vault.mysql.deleteDuration of a DELETE operation against the MySQL storage backendmssummary
vault.mysql.listDuration of a LIST operation against the MySQL storage backendmssummary
vault.postgres.putDuration of a PUT operation against the PostgreSQL storage backendmssummary
vault.postgres.getDuration of a GET operation against the PostgreSQL storage backendmssummary
vault.postgres.deleteDuration of a DELETE operation against the PostgreSQL storage backendmssummary
vault.postgres.listDuration of a LIST operation against the PostgreSQL storage backendmssummary
vault.s3.putDuration of a PUT operation against the Amazon S3 storage backendmssummary
vault.s3.getDuration of a GET operation against the Amazon S3 storage backendmssummary
vault.s3.deleteDuration of a DELETE operation against the Amazon S3 storage backendmssummary
vault.s3.listDuration of a LIST operation against the Amazon S3 storage backendmssummary
vault.spanner.putDuration of a PUT operation against the Google Cloud Spanner storage backendmssummary
vault.spanner.getDuration of a GET operation against the Google Cloud Spanner storage backendmssummary
vault.spanner.deleteDuration of a DELETE operation against the Google Cloud Spanner storage backendmssummary
vault.spanner.listDuration of a LIST operation against the Google Cloud Spanner storage backendmssummary
vault.spanner.lock.unlockDuration of an UNLOCK operation against the Google Cloud Spanner storage backend in HA modemssummary
vault.spanner.lock.lockDuration of a LOCK operation against the Google Cloud Spanner storage backend in HA modemssummary
vault.spanner.lock.valueDuration of a VALUE operation against the Google Cloud Spanner storage backend in HA modemssummary
vault.swift.putDuration of a PUT operation against the Swift storage backendmssummary
vault.swift.getDuration of a GET operation against the Swift storage backendmssummary
vault.swift.deleteDuration of a DELETE operation against the Swift storage backendmssummary
vault.swift.listDuration of a LIST operation against the Swift storage backendmssummary
vault.zookeeper.putDuration of a PUT operation against the ZooKeeper storage backendmssummary
vault.zookeeper.getDuration of a GET operation against the ZooKeeper storage backendmssummary
vault.zookeeper.deleteDuration of a DELETE operation against the ZooKeeper storage backendmssummary
vault.zookeeper.listDuration of a LIST operation against the ZooKeeper storage backendmssummary

Integrated Storage (Raft)

These metrics relate to raft based integrated storage.

MetricDescriptionUnitType
vault.raft.applyNumber of Raft transactions occurring over the interval, which is a general indicator of the write load on the Raft servers.raft transactions / intervalcounter
vault.raft.barrierNumber of times the node has started the barrier i.e the number of times it has issued a blocking call, to ensure that the node has all the pending operations that were queued, to be applied to the node's FSM.blocks / intervalcounter
vault.raft.candidate.electSelfTime to request for a vote from a peer.mssummary
vault.raft.commitNumLogsNumber of logs processed for application to the FSM in a single batch.logsgauge
vault.raft.commitTimeTime to commit a new entry to the Raft log on the leader.mstimer
vault.raft.compactLogsTime to trim the logs that are no longer needed.mssummary
vault.raft.deleteTime to delete file from raft's underlying storage.mssummary
vault.raft.delete_prefixTime to delete files under a prefix from raft's underlying storage.mssummary
vault.raft.fsm.applyNumber of logs committed since the last interval.commit logs / intervalsummary
vault.raft.fsm.applyBatchTime to apply batch of logs.mssummary
vault.raft.fsm.applyBatchNumNumber of logs applied in batch.mssummary
vault.raft.fsm.enqueueTime to enqueue a batch of logs for the FSM to apply.mstimer
vault.raft.fsm.restoreTime taken by the FSM to restore its state from a snapshot.mssummary
vault.raft.fsm.snapshotTime taken by the FSM to record the current state for the snapshot.mssummary
vault.raft.fsm.store_configTime to store the configuration.mssummary
vault.raft.getTime to retrieve file from raft's underlying storage.mssummary
vault.raft.leader.dispatchLogTime for the leader to write log entries to disk.mstimer
vault.raft.leader.dispatchNumLogsNumber of logs committed to disk in a batch.logsgauge
vault.raft.listTime to retrieve list of keys from raft's underlying storage.mssummary
vault.raft.peersNumber of peers in the raft cluster configuration.peersgauge
vault.raft.putTime to persist key in raft's underlying storage.mssummary
vault.raft.replication.appendEntries.logNumber of logs replicated to a node, to bring it up to speed with the leader's logs.logs appended / intervalcounter
vault.raft.replication.appendEntries.rpcTime taken by the append entries RFC, to replicate the log entries of a leader node onto its follower node(s).mstimer
vault.raft.replication.heartbeatTime taken to invoke appendEntries on a peer, so that it doesnโ€™t timeout on a periodic basis.mstimer
vault.raft.replication.installSnapshotTime taken to process the installSnapshot RPC call. This metric should only be seen on nodes which are currently in the follower state.mstimer
vault.raft.restoreNumber of times the restore operation has been performed by the node. Here, restore refers to the action of raft consuming an external snapshot to restore its state.operation invoked / intervalcounter
vault.raft.restoreUserSnapshotTime taken by the node to restore the FSM state from a user's snapshot.mstimer
vault.raft.rpc.appendEntriesTime taken to process an append entries RPC call from a node.mstimer
vault.raft.rpc.appendEntries.processLogsTime taken to process the outstanding log entries of a node.mstimer
vault.raft.rpc.appendEntries.storeLogsTime taken to add any outstanding logs for a node, since the last appendEntries was invoked.mstimer
vault.raft.rpc.installSnapshotTime taken to process the installSnapshot RPC call. This metric should only be seen on nodes which are currently in the follower state.mstimer
vault.raft.rpc.processHeartbeatTime taken to process a heartbeat request.mstimer
vault.raft.rpc.requestVoteTime taken to complete requestVote RPC call.mssummary
vault.raft.snapshot.createTime taken to initialize the snapshot process.mstimer
vault.raft.snapshot.persistTime taken to dump the current snapshot taken by the node to the disk.mstimer
vault.raft.snapshot.takeSnapshotTotal time involved in taking the current snapshot (creating one and persisting it) by the node.mstimer
vault.raft.state.followerNumber of times node has entered the follower mode. This happens when a new node joins the cluster or after the end of a leader election.follower state entered / intervalcounter
vault.raft.transition.heartbeat_timeoutNumber of times node has transitioned to the Candidate state, after receiving no heartbeat messages from the last known leader.timeouts / intervalcounter
vault.raft.transition.leader_lease_timeoutNumber of times quorum of nodes were not able to be contacted.contact failurescounter
vault.raft.verify_leaderNumber of times node checks whether it is still the leader or not.checks / intervalcounter
vault.raft-storage.deleteTime to insert log entry to delete path.mstimer
vault.raft-storage.getTime to retrieve value for path from FSM.mstimer
vault.raft-storage.putTime to insert log entry to persist path.mstimer
vault.raft-storage.listTime to list all entries under the prefix from the FSM.mstimer
vault.raft-storage.transactionTime to insert operations into a single log.mstimer
vault.raft-storage.entry_sizeThe total size of a Raft entry during log application in bytes.bytessummary
vault.raft_storage.bolt.freelist.
free_pages
Number of free pages in the freelist.pagesgauge
vault.raft_storage.bolt.freelist.
pending_pages
Number of pending pages in the freelist.pagesgauge
vault.raft_storage.bolt.freelist.
allocated_bytes
Total bytes allocated in free pages.bytesgauge
vault.raft_storage.bolt.freelist.
used_bytes
Total bytes used by the freelist.bytesgauge
vault.raft_storage.bolt.transaction.
started_read_transactions
Number of started read transactions.transactionsgauge
vault.raft_storage.bolt.transaction.
currently_open_read_transactions
Number of currently open read transactions.transactionsgauge
vault.raft_storage.bolt.page.countNumber of page allocations.allocationsgauge
vault.raft_storage.bolt.page.
bytes_allocated
Total bytes allocated.bytesgauge
vault.raft_storage.bolt.cursor.countNumber of cursors created.cursorsgauge
vault.raft_storage.bolt.node.countNumber of node allocations.nodesgauge
vault.raft_storage.bolt.node.dereferencesNumber of node dereferences.dereferencesgauge
vault.raft_storage.bolt.rebalance.countNumber of node rebalances.rebalancesgauge
vault.raft_storage.bolt.rebalance.timeTime taken rebalancing.mssummary
vault.raft_storage.bolt.split.countNumber of nodes split.nodesgauge
vault.raft_storage.bolt.spill.countNumber of nodes spilled.nodesgauge
vault.raft_storage.bolt.spill.timeTime taken spilling.mssummary
vault.raft_storage.bolt.write.countNumber of writes performed.writesgauge
vault.raft_storage.bolt.write.timeTime taken writing to disk.mssummary
vault.raft_storage.stats.commit_indexIndex of last raft log committed to disk on this node.sequence numbergauge
vault.raft_storage.stats.applied_indexHighest index of raft log either applied to the FSM or added to fsm_pending queue.sequence numbergauge
vault.raft_storage.stats.fsm_pendingNumber of raft logs this node has queued to be applied by the FSM.logsgauge
vault.raft_storage.follower.applied_index_deltaDelta between leader applied index and each follower's applied index reported by echoes.logsgauge
vault.raft_storage.follower.last_heartbeat_msTime since last echo request received by each follower.msgauge

Integrated Storage (Raft) Autopilot

MetricDescriptionUnitType
vault.autopilot.node.healthySet to 1 if the node_id is deemed healthy by Autopilot, 0 if notboolgauge
vault.autopilot.healthySet to 1 if Autopilot considers all nodes healthyboolgauge
vault.autopilot.failure_toleranceHow many nodes can be lost while maintaining quorum, i.e., number of healthy nodes in excess of quorumnodesgauge

Since Autopilot runs only on the active node, these metrics are emitted by the active node only.

Integrated Storage (Raft) Leadership Changes

MetricDescriptionUnitType
vault.raft.leader.lastContactMeasures the time since the leader was last able to contact the follower nodes when checking its leader leasemssummary
vault.raft.state.candidateIncrements whenever raft server starts an electionElectionscounter
vault.raft.state.leaderIncrements whenever raft server becomes a leaderLeaderscounter

Why are they vital?: If frequent elections or leadership changes occur, it would likely indicate network issues between the raft nodes or the raft servers cannot keep up with the load.

What to look for: For a healthy cluster, you're looking for a lastContact lower than 200ms, leader > 0 and candidate == 0. Deviations from this might indicate flapping leadership.

Integrated Storage (Raft) Automated Snapshots

These metrics related to the Enterprise feature Raft Automated Snapshots.

MetricDescriptionUnitType
vault.autosnapshots.total.snapshot.sizeFor storage_type=local, space on disk used by saved snapshotsbytesgauge
vault.autosnapshots.percent.maxspace.usedFor storage_type=local, percent used of maximum allocated spacepercentagegauge
vault.autosnapshots.save.errorsIncrements whenever an error occurs trying to save a snapshotn/acounter
vault.autosnapshots.save.durationMeasures the time taken saving a snapshotmssummary
vault.autosnapshots.last.success.timeEpoch time (seconds since 1970/01/01) of last successful snapshot saven/agauge
vault.autosnapshots.snapshot.sizeMeasures the size in bytes of snapshotsbytessummary
vault.autosnapshots.rotate.durationMeasures the time taken to rotate (i.e. delete) old snapshots to satisfy configured retentionmssummary
vault.autosnapshots.snapshots.in.storageNumber of snapshots in storagen/agauge

Metric Labels

MetricDescriptionExample
auth_methodAuthorization engine type .userpass
clusterThe cluster name from which the metric originated; set in the configuration file, or automatically generated when a cluster is createvault-cluster-d54ad07
creation_ttlTime-to-live value assigned to a token or lease at creation. This value is rounded up to the next-highest bucket; the available buckets are 1m, 10m, 20m, 1h, 2h, 1d, 2d, 7d, and 30d. Any longer TTL is assigned the value +Inf.7d
mount_pointPath at which an auth method or secret engine is mounted.auth/userpass/
namespaceA namespace path, or root for the root namespacens1
policyA single named policydefault
secret_engineThe [secret engine][secrets-engine] type.aws
token_typeIdentifies whether the token is a batch token or a service token.service
peer_idUnique identifier of a raft peer.node-1
node_idUnique identifier of a raft peer, same as peer_id.node-1
snapshot_config_nameFor automated snapshots, the name of the configurationconfig1
Edit this page on GitHub

On this page

  1. Telemetry
  2. Audit Metrics
  3. Core Metrics
  4. Runtime Metrics
  5. Policy Metrics
  6. Token, Identity, and Lease Metrics
  7. Resource Quota Metrics
  8. Merkle Tree and Write Ahead Log Metrics
  9. HA Metrics
  10. Replication Metrics
  11. Secrets Engines Metrics
  12. Storage Backend Metrics
  13. Integrated Storage (Raft)
  14. Integrated Storage (Raft) Autopilot
  15. Integrated Storage (Raft) Leadership Changes
  16. Integrated Storage (Raft) Automated Snapshots
  17. Metric Labels
Give Feedback(opens in new tab)
  • Certifications
  • System Status
  • Terms of Use
  • Security
  • Privacy
  • Trademark Policy
  • Trade Controls
  • Give Feedback(opens in new tab)