Resource quotas

Every single interaction with Vault, whether to put secrets into the key/value store or to generate new pairs of database credentials for the MySQL database, needs to go through Vault’s API.

One side effect of the API driven model is that applications and users can misbehave by overwhelming system resources through consistent and high volume API requests resulting in denial-of-service issues in some Vault nodes or even the entire Vault cluster. This risk is significantly increased when Vault API endpoints are exposed to thousands or millions of services deployed across a global infrastructure, especially in use cases of Vault where Vault is deployed as a service for internal developers.

Vault provides a feature, resource quotas, that allows Vault operators to specify limits on resources used in Vault. Specifically, Vault allows operators to create and configure API rate limits. Users of Vault Enterprise have a second option, alongside rate limits, lease-count quotas, which can limit the number of leases that can be in use at one time.

Rate limit quotas

Vault allows operators to create rate limit quotas which enforce API rate limiting using a token bucket algorithm. A rate limit quota can be created at the root level or defined on a namespace, mount, or full API path by specifying a path when creating the quota. The rate limiter is applied to each unique client IP address on a per-node basis (i.e. rate limit quotas are not replicated). A client may invoke rate requests at any given second, after which they may invoke additional requests at rate per-second.

A rate limit quota defined at the root level (i.e. empty path) is inherited by all namespaces and mounts. It acts as a single rate limiter for the entire Vault API. A rate limit quota defined on a namespace takes precedence over the global rate limit quota, a rate limit quota defined for a mount takes precedence over the global and namespace rate limit quotas, and a rate limit quota defined for a specific path will take precedence over global, namespace, and mount quotas. Additionally, quotas defined with a role to limit login requests made using that role on the specified auth mount will take precedence over all other quotas. In other words, the most specific quota rule will be applied.

A rate limit can be created with an optional block_interval, such that when set to a non-zero value, any client that hits a rate limit threshold will be blocked from all subsequent requests for a duration of block_interval seconds.

Vault also allows the inspection of the state of rate limiting in a Vault node through various metrics exposed and through enabling optional audit logging.

Feature	Description	Community Edition	Enterprise
Rate limit quotas	Limit maximum amount of requests per second (RPS) to a system or mount to protect network bandwidth	✅	✅
Rate limit quotas	Identity-based and collective rate limits with `group_by` modes	❌	✅
Lease count quotas	Cap number of leases generated in a system or mount to protect system stability and storage performance at scale	❌	✅

Exempt routes

By default, the following paths are exempt from rate limiting. However, Vault operators can override the set of paths that are exempt from all rate limit resource quotas by updating the rate_limit_exempt_paths configuration field.

Replication

When creating or modifying a quota on a performance secondary cluster, the path referred to by the quota must exist on the secondary cluster. Vault will error if you attempt to create or update a quota for a path that doesn't exist.

Quotas are stored on the primary cluster and all requests to create, update, or delete quotas get forwarded to the primary cluster. This has several implications:

It is not possible to create a quota for local mount on a performance secondary cluster.
Quotas can be read on a performance secondary cluster, even when the path referred to by the quota is local to the primary cluster.
If you create a performance primary and performance secondary cluster and set up a local mount on both clusters with the same path, a user that only has access to the performance secondary cluster can create/update a quota for this path that will apply to the performance primary cluster.

Rate limit quota precedence

You can define quotas on namespaces, mounts, and paths. If the path is an auth mount with a concept of roles (such as /auth/approle/), the rate limit quota restricts login requests to that mount with the specified role.

Only one rate limit quota rule can exist for any given path, and the most granular rate limit quota takes effect on the requests.

To understand which rate limit quota rule applies to a request, here is the order of precedence:

The rate limit quota matching the target namespace, mount, and role
The rate limit quota matching the target namespace, mount, and path
The rate limit quota matching the target namespace, mount, and the longest prefix of the target path that ends with a trailing glob (*)
The rate limit quota matching the target namespace, and mount
The rate limit quota matching the target namespace
The rate limit quota matching the closest parent namespace, as long as the match has the inheritable field set
Global rate limit quota

API

Resource quotas can be managed over the HTTP API. Please see Rate Limit Quotas API, Lease Count Quotas API, and Quotas Config API for more details.