Request Limiter

Appropriate Vault Enterprise license required

Beta (Deprecated)

The request limiter was released in Vault 1.16 as a Beta feature. During Beta evaluation we found an alternative approach better met the needs of our users. This feature will be removed from Vault in a future release. It is replaced with adaptive overload protection.

This document contains conceptual information about the Request Limiter and its user-facing effects.

Preventing overload

The Request Limiter aims to prevent overload by proactively detecting latency deviation from a baseline and adapting the number of allowed in-flight requests.

This is done in two phases at the beginning of an HTTP request:

Consult the current number of allowed in-flight requests. If the new request would exceed this limit, immediately reject it, indicating that the client should retry later.
If the request is allowed, begin a measurement of its latency, allowing the Request Limiter to calculate a new limit.

Resource constraints

The Request Limiter intentionally focuses on preventing overload derived from resource-constrained operations on the Vault server. Vault focuses on two specific types of resource constraints which commonly cause issues in production workloads:

Write latency in the storage backend, resulting in a growing queue of updates to be flushed. These writes originate primarily from Write-based HTTP methods.
CPU utilization caused by computationally expensive PKI issue requests (generally for RSA keys). Large numbers of these requests can consume all CPU resources, preventing timely processing of other requests such as heartbeats and health checks.

Storage constraints can be accounted for by limiting logical requests according to their http.Method. We only measure and limit requests with Write-based HTTP methods. Read requests do not generally cause storage updates, meaning that their latencies are unlikely to be correlated with storage constraints.

CPU constraints are accounted for using the same underlying library and technique; however, they require special treatment. The maximum number of concurrent pki/issue requests found in testing (again, specifically for RSA keys) is far lower than the minimum tolerable write request rate.

In both cases, utilization will be effectively throttled before Vault reaches any degraded state. The resulting 503 - Service Unavailable is a retryable HTTP response code, which can be handled to gracefully retry and eventually succeed. Clients should handle this by retrying with jitter and exponential backoff. This is done within Vault's API Client implementation, using the go-retryablehttp library.

Read requests

HTTP methods such as GET and LIST are not subject to write request limiting. This allows operators to continue querying server state without needing to retry.

Vault server overloaded

When Vault has reached capacity, new requests will be immediately rejected with a retryable 503 - Service Unavailable error.