Request Limiter
Appropriate Vault Enterprise license required
Beta (Deprecated)
The request limiter was released in Vault 1.16 as a Beta feature. During Beta evaluation we found an alternative approach better met the needs of our users. This feature will be removed from Vault in a future release. It is replaced with adaptive overload protection.
This document contains conceptual information about the Request Limiter and its user-facing effects.
Preventing overload
The Request Limiter aims to prevent overload by proactively detecting latency deviation from a baseline and adapting the number of allowed in-flight requests.
This is done in two phases at the beginning of an HTTP request:
Consult the current number of allowed in-flight requests. If the new request would exceed this limit, immediately reject it, indicating that the client should retry later.
If the request is allowed, begin a measurement of its latency, allowing the Request Limiter to calculate a new limit.
Resource constraints
The Request Limiter intentionally focuses on preventing overload derived from resource-constrained operations on the Vault server. Vault focuses on two specific types of resource constraints which commonly cause issues in production workloads:
Write latency in the storage backend, resulting in a growing queue of updates to be flushed. These writes originate primarily from
Write
-based HTTP methods.CPU utilization caused by computationally expensive PKI issue requests (generally for RSA keys). Large numbers of these requests can consume all CPU resources, preventing timely processing of other requests such as heartbeats and health checks.
Storage constraints can be accounted for by limiting logical requests according
to their http.Method
. We only measure and limit requests with Write
-based
HTTP methods. Read requests do not generally cause storage updates, meaning that
their latencies are unlikely to be correlated with storage constraints.
CPU constraints are accounted for using the same underlying library and technique; however, they require special treatment. The maximum number of concurrent pki/issue requests found in testing (again, specifically for RSA keys) is far lower than the minimum tolerable write request rate.
In both cases, utilization will be effectively throttled before Vault reaches
any degraded state. The resulting 503 - Service Unavailable
is a retryable
HTTP response code, which can be handled to gracefully retry and eventually
succeed. Clients should handle this by retrying with jitter and exponential
backoff. This is done within Vault's API Client
implementation, using the
go-retryablehttp library.
Read requests
HTTP methods such as GET
and LIST
are not subject to write request
limiting. This allows operators to continue querying server state without
needing to retry.
Vault server overloaded
When Vault has reached capacity, new requests will be immediately rejected with a
retryable 503 - Service Unavailable
error.