Nomad: Operating Guide | Control Plane Scaling

Control plane scaling

As your Nomad deployment grows and evolves, you may need to scale your server nodes to maintain performance and reliability.
This section outlines various strategies for scaling Nomad server nodes, each suited to different scenarios and requirements.

Metrics to Monitor

Keeping an eye on the right metrics is crucial for making smart decisions about how to scale your Nomad cluster. While HashiCorp provides a comprehensive list of Key Metrics for overall cluster health and Server Metrics for server-specific insights, it's important to remember that every setup is unique.

Here are some general guidelines on which metrics typically point towards specific scaling methods. Keep in mind that these are just starting points – you'll need to fine-tune based on your particular needs and setup.

Horizontal Scaling

nomad.worker.invoke_scheduler.service
nomad.worker.invoke_scheduler.batch
nomad.worker.invoke_scheduler.system
nomad.client.allocated.cpu
nomad.client.unallocated.cpu
nomad.client.allocated.memory
nomad.client.unallocated.memory

Vertical Scaling

nomad.worker.invoke_scheduler.service
nomad.worker.invoke_scheduler.batch
nomad.worker.invoke_scheduler.system
nomad.blocked_evals.total_blocked (if due to scheduler resource constraints)
nomad.plan.wait_for_index (if due to leader resource constraints)
nomad.plan.evaluate
nomad.plan.queue_depth
nomad.broker.total_unacked
nomad.broker.total_pending
nomad.plan.submit
nomad.client.allocated.cpu
nomad.client.unallocated.cpu
nomad.client.allocated.memory
nomad.client.unallocated.memory

Read-only replicas

nomad.worker.invoke_scheduler.service
nomad.worker.invoke_scheduler.batch
nomad.worker.invoke_scheduler.system
nomad.client.allocated.cpu
nomad.client.unallocated.cpu
nomad.client.allocated.memory
nomad.client.unallocated.memory

Multi-region federated clusters

nomad.client.allocated.cpu if consistently high across all nodes.
nomad.client.allocated.disk if consistently high across all nodes.
nomad.client.allocated.iops if consistently high across all nodes.
nomad.client.allocated.memory if consistently high across all nodes.

Vertical scaling

When it comes to increasing the capacity of your Nomad server nodes, vertical scaling involves adding CPU, memory, or disk IO to the underlying host.

For traditional datacenters, typically you would add resources to the VM. However, we recommend deploying new servers with the enhanced specifications. This method utilizes Autopilot and version tags.

Refer to the metrics listed above in vertical scaling section and when you notice an uptick in those metrics steadily over a period of time, then proceed with scaling the server nodes vertically.
Ensure a version is tagged on the server nodes that are currently deployed. This tag is a version number of your choosing that is not related to the Nomad version.
nomad operator autopilot set-config -upgrade-version-tag=1.1.0
Alternatively via the agent configuration:
```
server {
  upgrade_version = "1.1.0"
}
```
Launch new server nodes with increased CPU, memory, or disk resources; and increased version tag. For example, upgrade_version=1.2
Add these new nodes into your existing Nomad cluster.
Autopilot will automatically add the nodes and remove the old nodes from the cluster. You will need to delete the VM's if you are using a CSP, ideally with a build pipeline.

It is important to note that vertical scaling is the most effective method for addressing control plane resource contention, as the scheduling engine must serialize through the broker and "plan applier" on the leader. However, you may encounter limitations such as reaching the maximum capacity of a single instance, prohibitive costs, or the need to meet high availability requirements. In such cases, it is advisable to consider horizontal scaling strategies, implementing read-only replicas, or exploring the option of federated clusters for your Nomad infrastructure.

Horizontal scaling

Horizontal scaling is a concept of adding nodes to the existing cluster which allows traffic to be distributed across joined nodes.

Prerequisites

Refer to the metrics listed in the above horizontal scaling section. When you notice an uptick in those metrics steadily over a period of time, then proceed with scaling the server nodes horizontally.
An existing AMI or image template created and available. Packer can be used to make this process easy.
Leveraging Terraform and the relevant providers can make the implementation much quicker. Configuring these components from scratch is out of scope of this document, however below are links get you started.

Should I use the Nomad autoscaler?

It is not recommended to use the autoscaler for scaling the server nodes from 3 to 5. There are several reasons for this:

Once you scale up to 5 server nodes, it's often unnecessary and potentially destabilizing to scale back down.
The autoscaler currently lacks mechanisms to ensure the Raft leader is not inadvertently removed during a scale-down operation, which could disrupt cluster stability.
Managing a fluctuating number of server nodes can introduce unnecessary complexity in maintaining quorum and overall cluster health.

Instead, it is generally recommended to only increase the number of server nodes via Terraform based on careful evaluation of your cluster's performance and needs. This approach allows for more controlled and deliberate scaling decisions, ensuring the continued stability and efficiency of your Nomad cluster.

Enhanced read-only replica nodes

Nomad Enterprise provides the ability to scale clustered Nomad servers to include voting servers and non-voting read replicas. Read replicas still receive non from the cluster replication, however, they do not take part in quorum election operations nor will be promoted by Autopilot. Expanding your Nomad cluster in this way can scale read operations without impacting write latency.

Configure Read-Only Replica Nodes

Add the following configuration to your Nomad server configuration file of nodes you want to perform read operations:

server {   
    enabled = true  
    non_voting_server = true 
}

Restart the Nomad server to apply the changes.

Always monitor your read-only nodes and scale as needed.

Autoscaler

Unlike Horizontal scaling, the autoscaler can be used to scale up and down read only nodes due to the nodes not participating in leader voting.

Note

The autoscaler only applies to Nomad server & client nodes hosted on AWS, Azure, or GCP.

Prerequisites

Familiarize yourself with the concepts and complete the tutorial.
Refer to the metrics listed above in readonly scaling section.
An existing AMI or image template created and available. Ensure this template has the non_voting_server = true parameter listed in the server configuration. Packer can be used to make this process easy.
If using redundancy zones, ensure the redundancy_zone is set and your TF deployment pipeline supports this.
Leveraging Terraform and the relevant providers can make the implementation much quicker. Configuring these components from scratch is out of scope of this document, however below are links to get you started.

Set up a Nomad cluster on GCP

AWS Terraform provider resources
- aws_ami
- aws_autoscaling_group
AWS tutorials
- Manage AWS Auto Scaling Groups
- Set up a Nomad cluster on AWS
Azure Terraform provider resources
- azurerm_image
- azurerm_linux_virtual_machine_scale_set
Azure Tutorials
- Manage Azure Virtual Machine Scale Sets with Terraform
- Set up a Nomad cluster on Azure
GCP Terraform provider resources
- google_compute_image
- google_compute_autoscaler
GCP Tutorials
- Set up a Nomad cluster on GCP

Example read only autoscaling scaling policy for server nodes

Remember, autoscaling is an art. Every use case and environment is unique and will utilize a variety and combinations of APMs, strategies, and targets which will require fine tuning. Some metrics may not be sufficient on their own and will require a secondary metric to truly determine if the autoscaler should be triggered.

Additionally, always take a proactive approach when autoscaling. Never wait for issues to present themselves before autoscaling as that can possibly result in the autoscaler to fail the scaling event or cause downtime to workloads.

The example below evaluates if resources are constrained on the server nodes. It will then trigger a scaling event on an AWS ASG target if all nodesng conditions are are true:

Memory usage is above 80%
CPU usage is above 80%
go routines spike 50% over 5 minute period.

Remember, this example is meant as a starting point which will be sufficient for most read-only server node scaling use cases.

template {
        data = <<EOF
scaling "cluster_policy" {
  enabled = true
  min     = 0
  max     = 5

  policy {
    cooldown            = "2m"
    evaluation_interval = "1m"

    check "cpu_allocated_percentage" {
      source = "prometheus"
      query  = "sum(nomad_client_allocated_cpu{}*100/(nomad_client_unallocated_cpu{}+nomad_client_allocated_cpu{}))/count(nomad_client_allocated_cpu{})"

      strategy "threshold" {
        lower_bound = 80
      }
    }

    check "mem_allocated_percentage" {
      source = "prometheus"
      query  = "sum(nomad_client_allocated_memory{}*100/(nomad_client_unallocated_memory{}+nomad_client_allocated_memory{}))/count(nomad_client_allocated_memory{})"

      strategy "threshold" {
        lower_bound = 80
      }
    }
    
    check "goroutine_spike" { source = "prometheus" query = "100 *  max(nomad_runtime_num_goroutines{}) / max(nomad_runtime_num_goroutines{} offset 5m) - 1)" 
    
        strategy "threshold" { 
            upper_bound = 50 
            delta = 1 
        } 
    }
    
    target "aws-asg" {
      dry-run             = "false"
      aws_asg_name        = "nomad-servers"
      node_drain_deadline = "5m"
    }
  }
}
EOF
        destination = "${NOMAD_TASK_DIR}/policies/hashistack.hcl"
      }

Federated clusters

Federated clusters allow multiple Nomad clusters to work together, providing high availability and disaster recovery. By deploying Nomad clusters across multiple regions, users can interact with any Nomad server endpoint, regardless of its location. This provides a single pane of glass without worrying about which specific cluster they are connected to.

Warning

When ACLs are enabled, Nomad depends on an authoritative region to act as a single source of truth for ACL policies and global ACL tokens.

Refer to this tutorial on how to setup federation.

Consider implementing federated clusters when:

You have workloads in different regions or datacenters that you want to manage centrally.
Your single Nomad cluster is struggling to handle the scale of your operations across multiple locations.
You need to maintain separate clusters for compliance or organizational reasons, but still want a unified view and management capability.
You're looking to improve fault tolerance and disaster recovery by distributing your workloads across multiple clusters.

Migrate a job to another cluster

Once a new cluster has been federated with the primary cluster, the jobs will need to be updated to deploy to the new cluster. Detailing a job migration is outside the scope of this document, as each organization has vastly different and nuanced requirements and processes. If there is an existing deployment pipeline, it should be fairly straight forward.

The most difficult part is ensuring underlying infrastructure and application dependencies are met, such as all networking and firewall rules. Existing ACL and Sentinel policies do not need to be copied over manually as they are replicated automatically.

Within Nomad, it is as simple as updating or adding the datacenter and region block to your job file.

Scaling strategy summary

The following table presents scaling strategies in the typical order of implementation as your Nomad cluster grows and evolves.

Strategy	Pros	Cons	Best for
Vertical autoscaling	Simplest way to increase capacity	Cap to how large you can scale	You need quick capacity increases without the increased cost of adding nodes
Horizontal autoscaling	Easy to implement with no downtime	Increased instance cost, not recommended to scale passed 5 nodes	If current server node count is 3, instance sizes are near maximum, and you are observing performance bottlenecks
Read-only replicas	Improved read scalability, Reduced write contention	Additional resource overhead	You've reached vertical and horizontal scaling limit, large scheduling events or other read operation bottlenecks
Federated clusters	Global scale, improved fault tolerance	Increased operational complexity	Multi-region deployments, large enterprise footprint

Scaling Nomad server nodes is a crucial aspect of maintaining a healthy and efficient cluster as your workload demands increase. Remember that each scaling strategy has its own pros and cons, and the best approach depends on your specific use case, growth trajectory, and operational capabilities.

Tip

Before scaling in production, it's crucial to thoroughly test the changes in a development or test environment. This allows you to assess the impact on your entire infrastructure, including networking, storage, and any other systems that interact with your Nomad cluster. Use this testing phase to identify and resolve any potential issues, and to fine-tune your approach.

Ensure that you have a robust backup strategy in place before proceeding.
Regularly create and verify snapshots of your Nomad state, as these will be invaluable for quick recovery if needed during the scaling process.

Once you're confident in the results and have confirmed that your backup process is working correctly, update your disaster recovery and backup strategies to account for the new server specifications. Only then should you proceed with scaling your production environment, knowing you have a safety net in place.

Sentinel

Windows Clients