Vault
Core system telemetry
Core system telemetry provides information about the operational health of your Vault instance.
Default metrics
vault.core.active
| Metric type | Value | Description |
|---|---|---|
| gauge | boolean | Indicates whether the Vault node is active |
- A value of
1indicates that the node is active. - A value of
0indicates that the node is in standby.
vault.core.activity.fragment_size
| Metric type | Value | Description |
|---|---|---|
| counter | number | Number of type objects observed by the local node |
The fragment size metric includes labels to indicate if the objects counted were entities or tokens.
vault.core.activity.segment_write
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to write activity log segments to storage |
vault.core.check_token
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to complete a token check |
vault.core.fetch_acl_and_token
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to fetch ACL and token entries |
vault.core.handle_login_request
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to complete a login request |
vault.core.handle_request
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to complete a non-login request |
vault.core.in_flight_requests
| Metric type | Value | Description |
|---|---|---|
| gauge | requests | Number of requests currently in progress |
vault.core.leadership_lost
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Total time that a high-availability cluster node last maintained leadership |
Leadership time updates occur whenever leadership changes. Frequent updates to
vault.core.leadership_lost with low leadership times indicates flapping as
leader status rotates between nodes.
vault.core.leadership_setup_failed
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time taken by the most recent leadership setup failure |
Setup failure time is an important health metric for your high-availability
Vault installation. We strongly recommend that you closely monitor
vault.core.leadership_setup_failed and set alerts that keep you informed of
the overall cluster leadership status.
vault.core.license.expiration_time_epoch
| Metric type | Value | Description |
|---|---|---|
| gauge | timestamp | Epoch time (seconds since 1970-01-01) at which the license will expire |
vault.core.locked_users
| Metric type | Value | Description |
|---|---|---|
| gauge | users | The number of users currently locked out of Vault |
The number of locked users refreshes every 15 minutes.
vault.core.mount_table.num_entries
| Metric type | Value | Description |
|---|---|---|
| gauge | objects | Number of mounts in the given mount table |
Mountpoint count metrics include labels to indicate whether the relevant table is an authentication table or a logical table and whether the table is replicated or local.
vault.core.mount_table.size
| Metric type | Value | Description |
|---|---|---|
| gauge | bytes | The current size of the relevant mount table. |
Table size metrics include labels to indicate whether the relevant table is an authentication table or a logical table and whether the table is replicated or local.
vault.core.performance_standby
| Metric type | Value | Description |
|---|---|---|
| gauge | boolean | Indicates whether the reporting node is a performance standby |
- A value of
1indicates the node is a performance standby - A value of
0indicates the node is not a performance standby
vault.core.replication.dr.primary
| Metric type | Value | Description |
|---|---|---|
| gauge | boolean | Indicates whether the Vault node is a disaster recovery primary |
- A value of
1indicates that the node is a disaster recovery primary. - A value of
0indicates that the node is not a disaster recovery primary.
vault.core.replication.dr.secondary
| Metric type | Value | Description |
|---|---|---|
| gauge | boolean | Indicates whether the Vault node is a disaster recovery secondary |
- A value of
1indicates that the node is a disaster recovery secondary. - A value of
0indicates that the node is not a disaster recovery secondary.
vault.core.replication.performance.primary
| Metric type | Value | Description |
|---|---|---|
| gauge | boolean | Indicates whether the Vault node is a performance primary |
- A value of
1indicates that the node is a performance primary. - A value of
0indicates that the node is not a performance primary.
vault.core.replication.performance.secondary
| Metric type | Value | Description |
|---|---|---|
| gauge | boolean | Indicates whether the Vault node is a performance secondary |
- A value of
1indicates that the node is a performance secondary. - A value of
0indicates that the node is not a performance secondary.
vault.core.replication.write_undo_logs
| Metric type | Value | Description |
|---|---|---|
| gauge | boolean | Indicates whether undo logs are enabled |
- A value of
1indicates that Vault is generating undo logs. - A value of
0indicates that Vault is not generating undo logs.
vault.core.step_down
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to step down cluster leadership |
Barrier metrics
vault.barrier.delete
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to complete a DELETE operation at the barrier |
vault.barrier.get
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to complete a GET operation at the barrier |
vault.barrier.list
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to complete a LIST operation at the barrier |
vault.barrier.put
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to complete a PUT operation at the barrier |
Caching metrics
vault.cache.delete
| Metric type | Value | Description |
|---|---|---|
| counter | number | Number of deletes from the LRU cache |
vault.cache.hit
| Metric type | Value | Description |
|---|---|---|
| counter | number | Number of hits against the LRU cache that avoided a read from configured storage |
vault.cache.miss
| Metric type | Value | Description |
|---|---|---|
| counter | number | Number of misses against the LRU cache that required a read from configured storage |
vault.cache.write
| Metric type | Value | Description |
|---|---|---|
| counter | number | Number of writes to the LRU cache |
Metric collection metrics
vault.metrics.collection
| Metric type | Value | Description |
|---|---|---|
| summary | ms | The average time required (per gauge type) to collect usage data |
vault.metrics.collection.error
| Metric type | Value | Description |
|---|---|---|
| counter | number | The total number of errors (per gauge type) that Vault encountered while collecting usage data |
vault.metrics.collection.interval
| Metric type | Units | Description |
|---|---|---|
| summary | time duration | The current value of usage_gauge_period |
Quota metrics
Quota metrics relate to rate limit and lease count quotas. Each metric comes
with a name label that identifies the specific quota.
vault.quota.lease_count.counter
| Metric type | Value | Description |
|---|---|---|
| gauge | lease | Total number of leases associated with the named quota rule |
The number of leases reported is specific to the quota rule listed in the name
label, not the number of leases in general. For example, if the named rule
allows for 50 leases max and there are currently 40 leases in the scope of that
quota rule, the value of vault.quota.lease_count.counter is 40 even if there
are 1000 other leases that are unscoped or in the scope of other quota rules.
vault.quota.lease_count.max
| Metric type | Value | Description |
|---|---|---|
| gauge | lease | Maximum number of leases allowed by the named quota rule |
vault.quota.lease_count.violation
| Metric type | Value | Description |
|---|---|---|
| counter | number | Number of requests rejected due to exceeding the named lease count quota |
vault.quota.rate_limit.violation
| Metric type | Value | Description |
|---|---|---|
| counter | number | Number of requests rejected due to exceeding the named rate limit quota rule |
Request limiter metrics
Request Limiter metrics relate to request success signals observed by the request limiter and its current state. Note the request limiter is deprecated and will be removed in future Vault versions.
vault.core.limits.concurrency.write
| Metric type | Value | Description |
|---|---|---|
| gauge | number | Current number of allowed in-flight write requests |
vault.core.limits.concurrency.special_path
| Metric type | Value | Description |
|---|---|---|
| gauge | number | Current number of allowed in-flight special-path requests |
vault.core.limits.concurrency.service_unavailable
| Metric type | Value | Description |
|---|---|---|
| counter | number | Number of requests rejected by the request limiter |
vault.core.limits.concurrency.success
| Metric type | Value | Description |
|---|---|---|
| counter | number | Number of successful requests observed by the request limiter |
vault.core.limits.concurrency.dropped
| Metric type | Value | Description |
|---|---|---|
| counter | number | Number of significant request errors oberved by the request limiter |
vault.core.limits.concurrency.ignored
| Metric type | Value | Description |
|---|---|---|
| counter | number | Number of ignored request errors observed by the request limiter |
Ignored request errors result from early request cancellation. These errors are discarded from request limiter measurements to prevent skewing of latency measurements.
Rollback metrics
By default, Vault does not separate rollback metrics by mountpoint. To enable
explictly named metrics for mountpoints, enable the
add_mount_point_rollback_metrics
option in your telemetry configuration stanza.
If you enable named metrics for mountpoints, the metric name converts forward
slashes (/) in mount names to dashes (-). For example, if you have the
auth/token backend configured and mountpoint names enabled for telemetry,
the corresponding mount point metric string is auth-token.
vault.rollback.attempt.{MOUNTPOINT}
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to perform a rollback operation on the given mount point |
vault.rollback.attempt
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to perform a rollback operation |
vault.rollback.inflight
| Metric type | Value | Description |
|---|---|---|
| gauge | number | Number of rollback operations inflight |
vault.rollback.queued
| Metric type | Value | Description |
|---|---|---|
| guage | number | The number of rollback operations waiting to be started |
vault.rollback.waiting
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time between queueing a rollback operation and the operation starting |
Route metrics
Mount-specific route metrics for each configured mount point. Metric names
convert forward slashes (/) in mount names to dashes (-). For example, if
you have the auth/token backend configured, the corresponding mount point
metric string is auth-token
By default, Vault does not separate rollback metrics by mountpoint. To enable
explictly named metrics for mountpoints, enable the
add_mount_point_rollback_metrics
option in your telemetry configuration stanza.
vault.route.create.{MOUNTPOINT}
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to send a create request to the backend and for the backend to complete the operation for the given mount point |
vault.route.delete.{MOUNTPOINT}
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to send a delete request to the backend and for the backend to complete the operation for the given mount point |
vault.route.list.{MOUNTPOINT}
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to send a list request to the backend and for the backend to complete the operation for the given mount point |
vault.route.read.{MOUNTPOINT}
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to send a read request to the backend and for the backend to complete the operation for the given mount point |
vault.route.rollback.{MOUNTPOINT}
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to send a rollback request to the backend and for the backend to complete the operation for the given mount point |
Vault automatically schedules and performs mount point rollback operations to clean up partial errors.
vault.route.rollback
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to send a rollback request to the backend and for the backend to complete the operation |
Vault automatically schedules and performs mount point rollback operations to clean up partial errors.
Runtime metrics
Runtime metrics relate specifically to the Go runtime for your Vault instance.
vault.runtime.alloc_bytes
| Metric type | Value | Description |
|---|---|---|
| gauge | bytes | Space currently allocated to Vault processes |
The number of allocated bytes may peak from time to time, but should always return to a steady state value in a health Vault installation.
vault.runtime.free_count
| Metric type | Value | Description |
|---|---|---|
| gauge | number | Number of freed objects |
vault.runtime.gc_pause_ns
| Metric type | Value | Description |
|---|---|---|
| summary | ns | Time required to complete the last garbage collection run |
vault.runtime.heap_objects
| Metric type | Value | Description |
|---|---|---|
| gauge | number | Total number of objects on the heap in memory |
The vault.runtime.heap_objects metric is a good memory pressure indicator. We
recommend monitoring vault.runtime.heap_objects to establish an accurate
baseline and thresholds for alerting on the health of your Vault installation.
vault.runtime.malloc_count
| Metric type | Value | Description |
|---|---|---|
| gauge | number | Total number of allocated heap objects in memory |
vault.runtime.num_goroutines
| Metric type | Value | Description |
|---|---|---|
| gauge | number | Total number of Go routines running in memory |
The vault.runtime.num_goroutines metric is a good system load indicator. We
recommend monitoring vault.runtime.num_goroutines to establish an accurate
baseline and thresholds for alerting on the health of your Vault installation.
vault.runtime.sys_bytes
| Metric type | Value | Description |
|---|---|---|
| gauge | number | Total number of bytes allocated to Vault |
The total number of allocated system bytes includes space currently used by the heap plus space that has been reclaimed by, but not returned to, the operating system.
vault.runtime.total_gc_pause_ns
| Metric type | Value | Description |
|---|---|---|
| gauge | ns | The total garbage collector pause time since Vault was last started |
vault.runtime.total_gc_runs
| Metric type | Value | Description |
|---|---|---|
| gauge | number | The total number of garbage collection runs since Vault was last started |
Seal metrics
vault.core.post_unseal
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to complete post-unseal operations |
vault.core.pre_seal
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to complete pre-seal operations |
vault.core.seal.encrypt
| Metric type | Value | Description |
|---|---|---|
| counter | number | The number of times a seal-wrapped value has been encrypted |
vault.core.seal.encrypt.time
| Metric type | Value | Description |
|---|---|---|
| summary | ms | The time taken to seal encrypt a seal-wrapped value. |
vault.core.seal.decrypt
| Metric type | Value | Description |
|---|---|---|
| counter | number | The number of times a seal-wrapped value has been decrypted |
vault.core.seal.decrypt.time
| Metric type | Value | Description |
|---|---|---|
| summary | ms | The time taken to seal decrypt a seal-wrapped value. |
vault.core.seal-internal
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to complete internal Vault seal operations |
vault.core.seal.unreachable.time
| Metric type | Value | Description |
|---|---|---|
| summary | ms | The total time a seal has been unreachable by health check. |
vault.core.seal-with-request
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to complete seal operations that were triggered by explicit request |
vault.core.unseal
| Metric type | Value | Description |
|---|---|---|
| summary | ms | Time required to complete unseal operations |
vault.core.unsealed
| Metric type | Value | Description |
|---|---|---|
| gauge | boolean | Indicates whether Vault is currently unsealed |
- A value of
1indicates Vault is currently unsealed and clients can read secrets. - A value of
0indicates Vault is currently sealed and clients cannot read secrets.