Resource quotas
When many teams or users are sharing a Nomad cluster, there is the concern that a single user could use more than their fair share of resources. Resource quotas provide a mechanism for cluster administrators to restrict the resources within a namespace.
Once you attach a quota specification to a namespace, the Nomad cluster will count all resource usage by jobs in that namespace toward the quota limits. If the resource is exhausted, allocations within the namespace will be queued until resources become available—by other jobs finishing or the quota being expanded.
It is recommended to enable resource quotas for shared environments where multiple teams or applications are running on the same Nomad cluster. Below is a list of recommendations, however if you need a tutorial for how to implement, visit the Resource quotas tutorial page.
Quota Specification
Quota specifications are first class objects in Nomad. A quota specification has a unique name, an optional human readable description, and a set of quota limits. The quota limits define the allowed resource usage within a region.
Quota objects are shareable among namespaces. This allows an operator to define higher level quota specifications. For example, a “Team-A” quota, and multiple namespaces can apply the “Team-A” quota specification.
Resource quotas are defined using the quota block in the Nomad job specification. Here's an example:
quota "team-a-quota" {
limit {
region = "global"
region_limit {
cpu = 2000
memory = 4096
}
}
}
Note
It's crucial to properly design your namespace structure and workload placement within those namespaces, considering resource requirements and cluster capacity.Applying Quotas to Namespaces
Once you define and apply the quotas, they can be added to namespaces. Below is an example of how to apply a quota to a namespace.
- Add to your specification file the quota parameter:
name = "team-a-namespace"
description = "Namespace for Team A."
quota = "team-a-quota"
- Run
nomad namespace apply ./anamespace.hcl
Version Control
It's recommended to keep your namespace and quota specifications within version control for audibility and troubleshooting, and incorporate into a build pipeline that manages the deployment of quotas such as Terraform. The Nomad provider provides an easy way to manage quotas and namespaces within your pipeline.
Monitoring
Regularly monitor resource usage to ensure teams are within their quotas. There are several metric endpoints you can use to provide monitoring and alerts relating to quotas. Ensure the following block is set to false
in your client configuration.
telemetry {
disable_quota_utilization_metrics = false
}
nomad.nomad.blocked_evals.total_quota_limit
metric endpoint can be used to alert you when jobs are being blocked due to quotas being reached.
nomad.quota.utilization.cpu
, nomad.quota.utilization.cores
, nomad.quota.utilization.memory_mb
for resource consumption for a quota
Ensure to set your filters based on quota name, namespace, or/or region to provide an accurate report of where the limit has been reached.
ACL's
By implementing strict access control measures, you can prevent users from bypassing or modifying resource quotas without proper authorization.
This is accomplished with the quota{}
block within an ACL specification.
See the ACL section for more details.
Federated Clusters
Nomad makes working with quotas in a federated cluster simple by replicating quota specifications from the authoritative Nomad region. This allows operators to interact with a single cluster but create quota specifications that apply to all Nomad clusters. For example, you can create a single quota specification with multiple regions defined with their own limits.
name = "federated-example"
description = "A single quota spec affecting multiple regions"
limit {
region = "europe"
region_limit {
cpu = 20000
memory = 10000
}
}
limit {
region = "asia"
region_limit {
cpu = 10000
memory = 5000
}
}
Once this quota is applied to a namespace, it will be available to all federated clusters.
Communication
Ensure that all teams are aware of their resource quotas and the importance of adhering to them. Regular communication or providing monitoring dashboards or alerting can help in avoiding any unexpected resource exhaustion and reduce the burden of the Nomad platform operators.