Application Scaling

Application scaling is a critical aspect of operating Nomad Enterprise as a shared service. It enables organizations to efficiently manage resources, maintain application performance, and optimize costs in dynamic environments. As workloads fluctuate, the ability to automatically adjust resources becomes paramount for ensuring operational efficiency.

Nomad Enterprise offers multiple scaling mechanisms to address the diverse needs of modern applications. These include horizontal scaling, which adjusts the number of task instances, and dynamic application sizing, which modifies resource allocations for individual tasks.

Resource Allocation with Resource Blocks

Resource blocks in Nomad job specifications are crucial for ensuring optimal resource allocation and utilization. By defining resource requirements for tasks, you can guarantee that allocations will be placed on nodes that have sufficient capacity to run your applications.

When configuring required resources for your workload, focus on these primary attributes and their associated configuration parameters.

cpu & memory - resources{}
disk - emphemeral_disk{}
device type - device{}
Always specify CPU and memory requirements for each task. Allocate slightly more resources than the default minimum (CPU=100Mhz & Memory=300mb) to account for peak loads and prevent resource contention. Nomad will automatically find a suitable node with the available resources you specify. This should always be a first step before considering horizontal scaling scaling.
Conduct thorough performance testing to determine precise resource requirements for your applications.
Dynamic application sizing along with correct stress testing can give you an idea of what resources are required for your application.
emphemeral_disk{} are only used if you have data within the allocation itself. This does not include any mounted volumes or bind paths outside of the allocation directory. Ephemeral disks are perfect for data that you can rebuild if needed, such as an in-progress cache or a local copy of data.
If your application data does not need to be ephemeral, however you need larger allocation disk size, consider using volumes. If your workload has specific device requirements, consider using the device{} block to request access to a GPU, FPGA, or TPU.
Numa settings can also be leveraged for your application if needed.

Horizontal Application Scaling with Nomad Autoscaler

Horizontal scaling involves adding or removing instances of an application to handle varying loads. The count parameter determines the number of instances within a group and can be increased manually via UI, CLI, or API. Manual adjustments may be sufficient during your early Nomad adoption journey, but as your footprint grows and processes refine, you will need to leverage Nomad's autoscaler. The Autoscaler can dynamically adjust the number of instances based on predefined policies.

At a high level, the Autoscaler operates with three key components:

APM (Application Performance Monitoring): Collects metrics from either Nomad API, Prometheus, or Datadog. This will be configured by the Nomad operators on the client node configuration file and autoscaler job spec. Prometheus is the recommended APM. See the Observability section.
Strategy: Determines how to interpret metrics and make scaling decisions.
Target: Executes the scaling actions on the Nomad cluster. For application horizontal scaling, this block can be omitted as Nomad will populate them on job submission.

For more details, go to Nomad Autoscaler concepts page.

Horizontal Scaling Strategies

The Nomad Autoscaler provides several strategies for horizontal application scaling, each designed to address different scaling scenarios and requirements. This section details four strategies along with their use cases and recommendations to assist you in choosing the right strategy.

Threshold

The Threshold strategy scales based on upper and lower bounds of a metric, increasing when above the upper bound and decreasing when below the lower bound. For example, adding instances when CPU utilization exceeds 80% and removing instances when it drops below 30%.

Use Cases

Maintaining metrics within an acceptable range (e.g., memory usage between 40-80%)
Applications with bursty workloads where rapid scaling is required to handle sudden spikes in demand.
Useful for managing resources efficiently by scaling down during low demand periods.

Recommendations

Set appropriate upper and lower bounds based on application behavior and requirements
Use wider thresholds for more stable scaling, narrower for more responsive scaling
Combine with appropriate cooldown periods to prevent rapid scaling oscillations
Regularly tune the thresholds based on historical and forecasted data (Increased demand) to ensure optimal scaling

Target Value

The Target Value strategy aims to maintain a specific metric at a desired target value. Unlike the Threshold strategy, which is reactionary, the target value strategy is proactive. For example, keeping CPU utilization at 70%. Nomad adjusts the number of instances to achieve this target.

Use Cases

Applications with dynamic workloads where maintaining a specific performance metric is crucial
Helps in optimizing costs by scaling resources based on actual demand, avoiding over-provisioning

Recommendations

Carefully select the metric that best represents the application's performance and resource needs
Use with metrics that have a clear correlation to instance count (e.g., CPU usage)
Implement appropriate cooldown periods to prevent oscillation

Passthrough

The Pass-Through strategy allows external systems or custom logic to dictate the number of instances. Nomad acts as an executor, scaling the application based on the input it receives, removing computational power from the Nomad cluster nodes.

Use Cases

Custom scaling logic is required, such as integrating with external monitoring tools
Direct mapping of metrics to instance count (e.g., one instance per active user)
Scaling based on external systems or custom metrics

Recommendations

Ensure the metric directly correlates to the desired instance count
Implement safeguards (max_scale_up/down) to prevent rapid over or under-scaling
Ensure robust integration and failure testing between Nomad and the external system to avoid discrepancies in scaling decisions

Fixed Value

Used for client node scaling and not relevant to application scaling.

When choosing strategy, understand your workload thoroughly and select a strategy that aligns with your application's behavior and scaling needs. Start with conservative scaling strategies and as you fine tune and observe the scaling behavior, consider combining multiple strategies for more nuanced scaling decisions. Regular monitoring and adjustment of your scaling configurations are essential as your application and workload evolve.

Always be mindful of the resource implications to ensure your cluster can handle the potential maximum scale and implement safeguards by setting appropriate minimum and maximum values to prevent under or over-scaling, and use suitable cooldown periods to avoid rapid scaling oscillations. [sentinel], [quotas], and [node pool governance] can be implemented by operators as guardrails.

Thorough testing, including simulations of various load scenarios, is vital to verify the effectiveness of your scaling strategies.

Evaluation Interval and Cooldown Period

Evaluation interval and cooldown period are parameters that significantly impact the responsiveness and stability of your scaling operations.

Evaluation Interval

How often Nomad Autoscaler looks at your metrics to decide if it needs to scale up or down. It affects how quickly your system can respond to changes, how often it might scale, and how much work the Autoscaler itself has to do. Set it too short, and you might be scaling unnecessarily often; too long, and you might miss important spikes and cause performance degradation.

When choosing an interval, consider the following:

In most cases, the default settings should be sufficient, however critical and volatile workloads may require shorter intervals for quicker responses
Shorter intervals (e.g., 5 to 30s) increase the load on the Autoscaler and metric sources, however it can lead to more accurate scaling decisions, optimizing resource utilization for the applications
Longer intervals (e.g., 5 to 10 minutes) can be used if there are resource constraints on the autoscaler or if your application can tolerate slower response times to workload changes. This can help reduce the load on the autoscaler while still meeting the scaling needs of less time-sensitive applications

Cooldown Period

The cooldown period is a wait time enforced after each scaling action, during which no new scaling operations are allowed, serving to prevent rapid successive scaling actions, allow time for the system to stabilize after a scaling event, and reduce unnecessary scaling actions due to temporary spikes or dips in metrics.

When choosing a cooldown period, consider the following:

Should be longer than the time it takes for new instances to become fully operational
For workloads with predictable patterns, a longer cooldown period (e.g., 10 to 15 minutes) can be effective. For more volatile workloads, a shorter period (e.g., 2 to 5 minutes) might be necessary
Consider how long it takes for metrics to stabilize and reflect the impact of a scaling action

Scaling Policy Examples

Below are basic and common scaling policies. These should be sufficient as a starting point as you refine your scaling logic over time.

Memory and CPU Scaling

job "app" {
  datacenters = ["dc1"]

  group "app" {
    count = 1

    scaling {
      enabled = true
      min     = 1
      max     = 10

      policy {
        cooldown = "10m" 
        evaluation_interval = "5m"
        
        check "memory_usage" { 
            source = "prometheus"
            query = "nomad_client_allocs_memory_allocated{alloc_id=\"${NOMAD_ALLOC_ID}\",task_group=\"${NOMAD_GROUP_NAME}\"}" 
            
            strategy "threshold" { 
                upper_bound = 512
                lower_bound = 400
            } 
        }
        check "cpu_usage" { 
            source = "prometheus"
            query = "nomad_client_allocs_cpu_allocated{alloc_id=\"${NOMAD_ALLOC_ID}\",task_group=\"${NOMAD_GROUP_NAME}\"}"
            
            strategy "threshold" { 
                upper_bound = 1000
                lower_bound = 700
            } 
        }
      }
    }
    # ...
  }
}

Multiple checks can be defined and if both are satisfied, instances will be added when the upper_bound of 512mb of memory and 1000mhz CPU is met, until the defined max count is reached. Instances will be removed if the lower_bound of 400mb of memory and 700mhz CPU is met. Short evaluation interval (30s) allows for responsive scaling. Additionally, the evaluation interval is set to a conservative 5 minutes to respond to changes in traffic and the cooldown period is set to 10 minutes to allow the system to stabilize after each scaling action.

Custom Metrics

job "app" {
  datacenters = ["dc1"]

  group "app" {
    count = 1

    scaling {
      enabled = true
      min     = 3
      max     = 10

      policy {
        cooldown = "90s" 
        evaluation_interval = "15s"
        
        check "request_rate" { 
            source = "prometheus"
            query = "sum(rate(http_requests_total{}[1m]))" 
            
            strategy "target-value" { 
                target = 100
            } 
        }
      }
    }
    # ...
  }
}

This example shows an API service which requires an aggressive scaling policy using a custom prometheus metric that is specific to that application. A scaling event (either up or down) will be triggered to keep the target value of 100 requests per minute with a minimum of 3 instances. A short evaluation interval (15s) allows quick reaction to traffic changes and a minimum of 3 instances ensures high availability. Additionally, the evaluation_interval is set to 15 seconds to quickly respond to changes in traffic. If we assume the API service has been tested thoroughly and is stable enough, we can confidently set the cooldown to 90 seconds to quickly stabilize after each scaling action.

Spread Considerations

The spread block allows application owners to increase the failure tolerance of their applications by specifying a node attribute that allocations should be spread over. This allows operators to spread allocations over attributes such as datacenter, availability zone, or even rack in a physical datacenter via metadata.

In a self-service environment, end users deploying applications into Nomad may not always be aware of cluster-level spread configurations. As a best practice, it's advisable to include a spread block in your job file when specific distribution requirements are needed. This approach ensures that your application's distribution needs are met, regardless of any cluster-level settings.

Note

When scaling down (reducing the number of allocations), the autoscaler does not consider the spread configuration or metadata when choosing which allocations to remove. Instead, it uses allocation IDs to determine the order of removal.

For example, if you have:

Allocations 1, 2, 3 on Node 1
Allocations 4, 5, 6 on Node 2
Allocations 7, 8, 9 on Node 3

When scaling down, Nomad will remove allocations in reverse order (9, 8, 7, etc.) rather than maintaining an even distribution across nodes. is behavior can potentially impact the availability and balance of your application across the cluster during scale-down operations.

When specific distribution requirements are needed, always include a spread block in your job file. This ensures your requirements are met regardless of cluster-level configurations.
Be aware of the current limitation in scale-down operations. You may need to implement additional logic or monitoring to maintain desired distribution during scale-down events.
You can use multiple spread blocks to create more complex distribution strategies. For example, spreading across both datacenters and rack IDs.
Regularly monitor the actual distribution of your allocations to ensure they align with your intended spread configuration.

Dynamic Application Sizing

Dynamic application sizing (DAS) allows Nomad to provide CPU and memory recommendations to tasks based on their actual usage. This feature complements horizontal scaling by optimizing resource utilization at the individual task level.

The scaling stanza is the same for horizontal scaling with the main difference being it is applied to the task level for vertical scaling via DAS and three additional strategies.

Dynamic app sizing concept and dynamic app sizing tutorial page will provide an overview of the concepts, how to configure, and recommendations.

While DAS doesn't automatically apply recommended resource changes to tasks, you can implement automation to streamline the process and reduce manual intervention. Consider the following approaches if you're confident in your metrics and DAS configuration:

If the native Nomad UI is not sufficient, then you can utilize the Nomad API to retrieve recommendations and display them in a custom dashboard for easy monitoring and decision-making.
Implement alerts based on DAS recommendations or significant resource changes. These alerts can be directed to operators, application owners, or both, ensuring timely awareness of potential optimizations.
Incorporate DAS recommendations into your deployment process for new versions to ensure they receive the latest recommended resource allocations.
DAS is is configured individually for each group or task, rather than through a global setting that affects all jobs within your Nomad environment, meaning there isn’t a single, overarching setting that can enable DAS across all jobs simultaneously. This means you’ll need to modify each job specification in your deployment pipeline to incorporate the appropriate DAS configuration blocks.

Stateful Workload Considerations

When implementing autoscaling for stateful applications in Nomad, several important factors need to be considered to ensure data integrity and performance. Be aware that certain stateful applications may have inherent limitations on how they can be scaled. Be aware that some stateful applications may not be a good fit for autoscaling.

Storage

The underlying storage is crucial for stateful applications and it's especially true when autoscaling. For container workloads, leverage Nomad’s Container Storage Interface (CSI) support to dynamically attach storage to instances as your workload scales. This gives you the flexibility to manage storage resources on-the-fly, adapting to your application’s changing needs.

Networking

While networking is generally less of a concern when autoscaling within the same Nomad cluster, some stateful applications require stable network identities (e.g., IP addresses) for client connections or inter-node communication. Be aware of these requirements when designing your scaling strategy and collaborate closely with your networking team.

State Consistency

New instances need to be synchronized with the current state of the application, which can be time-consuming and resource-intensive. Consider the performance impact on your stateful application during scale events and plan accordingly.

Additionally, implement conservative scaling policies that allow sufficient time for data synchronization and state management between scaling events.

Clustered Applications

For clustered applications, utilize variable locks to prevent split-brain scenarios and data corruption during scale-up and scale-down events.

Kill timeout

Ensure your application has proper shutdown procedures to safely persist state before scaling down. Use kill_timeout to allow sufficient time for graceful shutdowns and consider max_kill_timeout set on the client configuration to ensure job authors do not exceed this amount.

Service Discovery

Implement service discovery or service mesh to dynamically manage network identities and route traffic appropriately.

As always, use a well-tested autoscaling development environment to identify and resolve any issues before implementing in production. This is particularly crucial for stateful applications due to their data persistence requirements. For additional details not related to autoscaling, view Considerations for Stateful Workloads page.

Standardization

To promote a self service model, operators should consider implementing job templates or utilizing Nomad Pack to provide a consistent deployment workflow to include standard metrics across applications while still allowing for customization.

An alternative method involves creating a global scaling policy within the autoscaler agent configuration. However, it's important to recognize that a one-size-fits-all approach may not be suitable for all applications.

In many cases, it's more effective to allow job authors to determine the specific scaling requirements for their applications.

Operators can then use features such as Sentinel, resource quotas, and node pool governance to establish appropriate guardrails. This balanced approach ensures flexibility for individual application needs while maintaining overall system integrity and resource management.

Workload Orchestration

Service Discovery