Control plane scaling
As your Nomad deployment grows and evolves, you may need to scale your server nodes to maintain performance and reliability.
This section outlines various strategies for scaling Nomad server nodes, each suited to different scenarios and requirements.
Metrics to Monitor
Keeping an eye on the right metrics is crucial for making smart decisions about how to scale your Nomad cluster. While HashiCorp provides a comprehensive list of Key Metrics for overall cluster health and Server Metrics for server-specific insights, it's important to remember that every setup is unique.
Here are some general guidelines on which metrics typically point towards specific scaling methods. Keep in mind that these are just starting points – you'll need to fine-tune based on your particular needs and setup.
Horizontal Scaling
- nomad.worker.invoke_scheduler.service
- nomad.worker.invoke_scheduler.batch
- nomad.worker.invoke_scheduler.system
- nomad.client.allocated.cpu
- nomad.client.unallocated.cpu
- nomad.client.allocated.memory
- nomad.client.unallocated.memory
Vertical Scaling
- nomad.worker.invoke_scheduler.service
- nomad.worker.invoke_scheduler.batch
- nomad.worker.invoke_scheduler.system
- nomad.blocked_evals.total_blocked (if due to scheduler resource constraints)
- nomad.plan.wait_for_index (if due to leader resource constraints)
- nomad.plan.evaluate
- nomad.plan.queue_depth
- nomad.broker.total_unacked
- nomad.broker.total_pending
- nomad.plan.submit
- nomad.client.allocated.cpu
- nomad.client.unallocated.cpu
- nomad.client.allocated.memory
- nomad.client.unallocated.memory
Read-only replicas
- nomad.worker.invoke_scheduler.service
- nomad.worker.invoke_scheduler.batch
- nomad.worker.invoke_scheduler.system
- nomad.client.allocated.cpu
- nomad.client.unallocated.cpu
- nomad.client.allocated.memory
- nomad.client.unallocated.memory
Multi-region federated clusters
nomad.client.allocated.cpu
if consistently high across all nodes.nomad.client.allocated.disk
if consistently high across all nodes.nomad.client.allocated.iops
if consistently high across all nodes.nomad.client.allocated.memory
if consistently high across all nodes.
Vertical scaling
When it comes to increasing the capacity of your Nomad server nodes, vertical scaling involves adding CPU, memory, or disk IO to the underlying host.
For traditional datacenters, typically you would add resources to the VM. However, we recommend deploying new servers with the enhanced specifications. This method utilizes Autopilot and version tags.
Refer to the metrics listed above in vertical scaling section and when you notice an uptick in those metrics steadily over a period of time, then proceed with scaling the server nodes vertically.
Ensure a version is tagged on the server nodes that are currently deployed. This tag is a version number of your choosing that is not related to the Nomad version.
nomad operator autopilot set-config -upgrade-version-tag=1.1.0
Alternatively via the agent configuration:
server { upgrade_version = "1.1.0" }
Launch new server nodes with increased CPU, memory, or disk resources; and increased version tag. For example,
upgrade_version=1.2
Add these new nodes into your existing Nomad cluster.
Autopilot will automatically add the nodes and remove the old nodes from the cluster. You will need to delete the VM's if you are using a CSP, ideally with a build pipeline.
It is important to note that vertical scaling is the most effective method for addressing control plane resource contention, as the scheduling engine must serialize through the broker and "plan applier" on the leader. However, you may encounter limitations such as reaching the maximum capacity of a single instance, prohibitive costs, or the need to meet high availability requirements. In such cases, it is advisable to consider horizontal scaling strategies, implementing read-only replicas, or exploring the option of federated clusters for your Nomad infrastructure.
Horizontal scaling
Horizontal scaling is a concept of adding nodes to the existing cluster which allows traffic to be distributed across joined nodes.
Prerequisites
- Refer to the metrics listed in the above horizontal scaling section. When you notice an uptick in those metrics steadily over a period of time, then proceed with scaling the server nodes horizontally.
- An existing AMI or image template created and available. Packer can be used to make this process easy.
- Leveraging Terraform and the relevant providers can make the implementation much quicker. Configuring these components from scratch is out of scope of this document, however below are links get you started.
Should I use the Nomad autoscaler?
It is not recommended to use the autoscaler for scaling the server nodes from 3 to 5. There are several reasons for this:
- Once you scale up to 5 server nodes, it's often unnecessary and potentially destabilizing to scale back down.
- The autoscaler currently lacks mechanisms to ensure the Raft leader is not inadvertently removed during a scale-down operation, which could disrupt cluster stability.
- Managing a fluctuating number of server nodes can introduce unnecessary complexity in maintaining quorum and overall cluster health.
Instead, it is generally recommended to only increase the number of server nodes via Terraform based on careful evaluation of your cluster's performance and needs. This approach allows for more controlled and deliberate scaling decisions, ensuring the continued stability and efficiency of your Nomad cluster.
Enhanced read-only replica nodes
Nomad Enterprise provides the ability to scale clustered Nomad servers to include voting servers and non-voting read replicas. Read replicas still receive non from the cluster replication, however, they do not take part in quorum election operations nor will be promoted by Autopilot. Expanding your Nomad cluster in this way can scale read operations without impacting write latency.
Configure Read-Only Replica Nodes
Add the following configuration to your Nomad server configuration file of nodes you want to perform read operations:
server {
enabled = true
non_voting_server = true
}
Restart the Nomad server to apply the changes.
Always monitor your read-only nodes and scale as needed.
Autoscaler
Unlike Horizontal scaling, the autoscaler can be used to scale up and down read only nodes due to the nodes not participating in leader voting.
Note
The autoscaler only applies to Nomad server & client nodes hosted on AWS, Azure, or GCP.Prerequisites
- Familiarize yourself with the concepts and complete the tutorial.
- Refer to the metrics listed above in readonly scaling section.
- An existing AMI or image template created and available. Ensure this template has the
non_voting_server = true
parameter listed in the server configuration. Packer can be used to make this process easy. - If using redundancy zones, ensure the
redundancy_zone
is set and your TF deployment pipeline supports this. - Leveraging Terraform and the relevant providers can make the implementation much quicker. Configuring these components from scratch is out of scope of this document, however below are links to get you started.
- AWS Terraform provider resources
- AWS tutorials
- Azure Terraform provider resources
- Azure Tutorials
- GCP Terraform provider resources
- GCP Tutorials
Example read only autoscaling scaling policy for server nodes
Remember, autoscaling is an art. Every use case and environment is unique and will utilize a variety and combinations of APMs, strategies, and targets which will require fine tuning. Some metrics may not be sufficient on their own and will require a secondary metric to truly determine if the autoscaler should be triggered.
Additionally, always take a proactive approach when autoscaling. Never wait for issues to present themselves before autoscaling as that can possibly result in the autoscaler to fail the scaling event or cause downtime to workloads.
The example below evaluates if resources are constrained on the server nodes. It will then trigger a scaling event on an AWS ASG target if all nodesng conditions are are true:
- Memory usage is above 80%
- CPU usage is above 80%
- go routines spike 50% over 5 minute period.
Remember, this example is meant as a starting point which will be sufficient for most read-only server node scaling use cases.
template {
data = <<EOF
scaling "cluster_policy" {
enabled = true
min = 0
max = 5
policy {
cooldown = "2m"
evaluation_interval = "1m"
check "cpu_allocated_percentage" {
source = "prometheus"
query = "sum(nomad_client_allocated_cpu{}*100/(nomad_client_unallocated_cpu{}+nomad_client_allocated_cpu{}))/count(nomad_client_allocated_cpu{})"
strategy "threshold" {
lower_bound = 80
}
}
check "mem_allocated_percentage" {
source = "prometheus"
query = "sum(nomad_client_allocated_memory{}*100/(nomad_client_unallocated_memory{}+nomad_client_allocated_memory{}))/count(nomad_client_allocated_memory{})"
strategy "threshold" {
lower_bound = 80
}
}
check "goroutine_spike" { source = "prometheus" query = "100 * max(nomad_runtime_num_goroutines{}) / max(nomad_runtime_num_goroutines{} offset 5m) - 1)"
strategy "threshold" {
upper_bound = 50
delta = 1
}
}
target "aws-asg" {
dry-run = "false"
aws_asg_name = "nomad-servers"
node_drain_deadline = "5m"
}
}
}
EOF
destination = "${NOMAD_TASK_DIR}/policies/hashistack.hcl"
}
Federated clusters
Federated clusters allow multiple Nomad clusters to work together, providing high availability and disaster recovery. By deploying Nomad clusters across multiple regions, users can interact with any Nomad server endpoint, regardless of its location. This provides a single pane of glass without worrying about which specific cluster they are connected to.
Warning
When ACLs are enabled, Nomad depends on an authoritative region to act as a single source of truth for ACL policies and global ACL tokens.
Refer to this tutorial on how to setup federation.
Consider implementing federated clusters when:
- You have workloads in different regions or datacenters that you want to manage centrally.
- Your single Nomad cluster is struggling to handle the scale of your operations across multiple locations.
- You need to maintain separate clusters for compliance or organizational reasons, but still want a unified view and management capability.
- You're looking to improve fault tolerance and disaster recovery by distributing your workloads across multiple clusters.
Migrate a job to another cluster
Once a new cluster has been federated with the primary cluster, the jobs will need to be updated to deploy to the new cluster. Detailing a job migration is outside the scope of this document, as each organization has vastly different and nuanced requirements and processes. If there is an existing deployment pipeline, it should be fairly straight forward.
The most difficult part is ensuring underlying infrastructure and application dependencies are met, such as all networking and firewall rules. Existing ACL and Sentinel policies do not need to be copied over manually as they are replicated automatically.
Within Nomad, it is as simple as updating or adding the datacenter and region block to your job file.
Scaling strategy summary
The following table presents scaling strategies in the typical order of implementation as your Nomad cluster grows and evolves.
Strategy | Pros | Cons | Best for |
---|---|---|---|
Vertical autoscaling | Simplest way to increase capacity | Cap to how large you can scale | You need quick capacity increases without the increased cost of adding nodes |
Horizontal autoscaling | Easy to implement with no downtime | Increased instance cost, not recommended to scale passed 5 nodes | If current server node count is 3, instance sizes are near maximum, and you are observing performance bottlenecks |
Read-only replicas | Improved read scalability, Reduced write contention | Additional resource overhead | You've reached vertical and horizontal scaling limit, large scheduling events or other read operation bottlenecks |
Federated clusters | Global scale, improved fault tolerance | Increased operational complexity | Multi-region deployments, large enterprise footprint |
Scaling Nomad server nodes is a crucial aspect of maintaining a healthy and efficient cluster as your workload demands increase. Remember that each scaling strategy has its own pros and cons, and the best approach depends on your specific use case, growth trajectory, and operational capabilities.
Tip
Before scaling in production, it's crucial to thoroughly test the changes in a development or test environment.
This allows you to assess the impact on your entire infrastructure, including networking, storage, and any other systems that interact with your Nomad cluster.
Use this testing phase to identify and resolve any potential issues, and to fine-tune your approach.
Ensure that you have a robust backup strategy in place before proceeding.
Regularly create and verify snapshots of your Nomad state, as these will be invaluable for quick recovery if needed during the scaling process.
Once you're confident in the results and have confirmed that your backup process is working correctly, update your disaster recovery and backup strategies to account for the new server specifications. Only then should you proceed with scaling your production environment, knowing you have a safety net in place.