Self-Service Application Deployment

Nomad Enterprise offers a powerful and flexible platform for self-service application deployment, catering to a wide range of workloads and deployment strategies.

As an individual deploying applications into Nomad, your primary focus should be on your application's requirements instead of the infrastructure hosting your application. If any of the topics discussed below are not met, establish a clear communication channel with the Nomad platform operators to address these gaps.

To facilitate a smooth self-service experience, Nomad operators should implement a standardized workflow. This approach eliminates guesswork regarding available resources, networking options, and storage solutions.

Along with the topics discussed under Initial Configuration and the key components are discussed in depth throughout this section, you should feel equipped to provide a powerful self-service solution for deploying your applications to Nomad.

Advanced Deployment Strategies

Nomad Enterprise offers modern deployment strategies for your long running services that allow for seamless updates, risk mitigation, and minimal downtime. These strategies support both container and non-container workloads.

While rolling upgrades offer an excellent starting point for organizations embarking on their modernization journey, the ultimate goal for many is to implement sophisticated Blue/Green deployment models.

Nomad's flexible approach allows teams to evolve their deployment practices gradually, adapting to increasing complexity and operational maturity over time.

Rolling Upgrades

Rolling upgrades gradually replace instances of the old version with the new version and if there is an issue during deploying, auto rollback can be leveraged to minimize service disruption.

This strategy is great for the early stages of Nomad adoption as it balances efficiency and risk, making it suitable for many applications, but it can be slower for large deployments and requires careful management of multiple versions during the transition.

Rollbacks are simple to execute and crucial for minimizing downtime however, they don't prevent the initial impact of a deployment and shouldn't be relied upon for 0% disruption.

To elaborate, if a user has a sticky session with an instance and then it's replaced by a rolling upgrade, the session may not be immediately redirected to the newer instance. Instead, it will be terminated.

View Rolling updates tutorial for more information on enabling this feature in your jobs.

Recommendations

View update documentation for all available options
Set auto_revert to true to revert and rollback to the latest stable job version.
Note
Consider if this could cause issues for your stateful workloads such as databases. If so, set this to `false`.
Ensure your health checks and timeouts are set correctly for your tasks since the update block uses the checks block within a task to ensure a task is healthy before its promoted.
See service discovery section for more details on configuring the service.check block
Use stagger and max_parallel to ensure a minimal disruption
You can view the status of the rolling update by view the landing page of your job

Single Canary Deployment

Canary deployments involve releasing a new version instance to a small subset of users, without affecting the existing workloads. This is accomplished by using a load balancer or proxy that can dynamically update instances and set various endpoint weights.

Consul is a great option by using a service router with service mesh. When you confirm there are no errors, you promote the older instances, by performing a rolling upgrade.

Canary deployments are a great next step after rolling upgrades. Single canary deployments come with the same risks as rolling upgrades along with the small sample size of canary deployments can sometimes lead to misleading performance or usage metrics. The single instance is meant as a production grade test before the final rolling upgrade.

View Deploy with canaries tutorial for more information on how to enable this functionality for your jobs.

If using Consul service mesh, visit Deploy seamless canary deployments with service splitters tutorial to learn how to leverage Consul for B/G and Canary deployments.

Recommendations

Properly evaluate the subset of users that will be the best candidates for testing and quick to report any issues
Use Consul or another 3rd party load balancer or proxy to steer traffic during the upgrade

Manual promotion may be required for tier 1 critical workloads
Use automated rollback mechanisms to fail the canary deployment if certain metrics are reached

Blue/Green Deployments

After implementing single canary deployments, Blue/Green deployments are the last and best step in your deployment strategy. In a blue/green deployment, there are two application versions.

Only one application version is active at a time, except during the transition phase from one version to the next. The term "active" tends to mean "receiving traffic" or "in service". B/G deployments scale the best and provide zero downtime deployments.

Similar to single canary, this is accomplished by using a load balancer or proxy that can dynamically update instances and set various endpoint weights. Consul is a great option by using a service router with service mesh.

When you confirm there are no errors, you cut over traffic incrementally to the newer instances.

Visit the Blue/Green Deployments tutorial to learn how to implement.

If using Consul service mesh, visit Deploy seamless canary deployments with service splitters tutorial to learn how to leverage Consul for B/G and Canary deployments.

Recommendations

If feasible, consider implementing blue/green deployments at the start of your project.
Tip
This approach depends on having a load balancer or proxy that can split and route Layer 7 traffic, which might not be available in your environment.
If these prerequisites are not met, begin with the rolling upgrades strategy as you develop your canary and blue/green upgrade strategies.
Instead of an entire cutover, consider gradually shifting traffic from blue to green. This strategy helps mitigate risks by limiting the potential impact on the broader user base, allowing for more controlled observation and easier rollback if issues arise.
Implement automated testing of the new instances before switching users over to the new instances. If your automated testing is sufficient, consider using the auto_promote flag.
Leverage monitoring and alerting to reveal any issues when new users are moved over to the new instances.

Workload Dependencies

Task dependencies occur when one task relies on another task, job, or external resource to be running or available. There are several patterns to manage these dependencies to ensure your application is dynamic and resilient.

Lifecycle block

Used to express task dependencies in Nomad by configuring when a task is run within the lifecycle of a task group. Main tasks are tasks that do not have a lifecycle block.

Init Tasks - A task that completes and exits before proceeding with the main task
Sidecar Pattern - A task that starts after the main tasks, but stays alive and runs along side the main task
Cleanup Task - A task that runs after the main tasks have stopped. They are useful for performing post-processing that isn't available in the main tasks.
Leader task - can be used to specify if the task is the leader of the group. If used, all other tasks are dependent on the leader. Meaning, the leader task stops first, followed by non-sidecar and non-poststop tasks, and finally sidecar tasks. Once this process completes, post-stop tasks are triggered.

Template block

As it relates to task dependencies, templates are useful for generating dynamic configuration data from other workloads such as:

IP address and/or dynamic ports of a service
Retrieve KV data from Consul or Vault
A tasks metadata values
Retrieve job or client node related variables and use them as environment variables such as the job name, datacenter name, or the bridged network IP address

Templates should be used if you need to reference another task within the job or you expect data of the upstream job to change. View service discovery section for more information on templating.

Warning

For templates that read from Vault, Consul, or Nomad, each item read is called a "dependency".

All the template blocks share the same internal runner which de-duplicates dependencies requesting the same item.

You should avoid having large numbers of dependencies for a given task, as each dependency requires at least one concurrent request (a possibly blocking query) to the upstream server.

If a task has more than 128 dependencies, a warn-level log will appear in the Nomad client logs which reports "watching this many dependencies could DDoS your servers", referring to the Vault, Consul, or Nomad cluster being queried.

Tip

For templates that read from Vault, Consul, or Nomad, there are settings called respectively vault_retry, consul_retry and nomad_retry in each Nomad client’s configuration file that configures how retries will be handled in case of unavailability of Vault, Consul or Nomad.

The default behavior, which includes killing allocations after a number of failed retries, might not suit every environment.

Health Checking

The service discovery section details health checking for your services to ensure they are running properly.

However, there are some task dependency specific recommendations:

If possible, implement a reasonable retry attempt within your application before sending exit signals.
The built in health checking covers most use cases, however if it is not sufficient, consider implementing circuit breakers within your application code or through Consul service mesh
You can run multiple health checks for a service. One could check itself, and another could run a script that runs a custom binary command you may use in your environment.

service {
  check {
    name     = "HTTP Check"
    type     = "http"
    port     = "api"
    path     = "/v1/health"
    interval = "5s"
    timeout  = "2s"
  }
  check {
    type    = "script"
    command = "/bin/bash"
    args    = ["-c", "/bin/some-binary", "some args"]
  }
}

See service discovery health check section for more detailed explanations and recommendations.

External dependencies

When dealing with external dependencies outside of Nomad, you may need to consider several strategies to ensure their applications are resilient and can handle potential issues with these dependencies. Here are some recommendations:

artifact block The artifact block allows you to download external resources before starting a task. This is useful for fetching configuration files, libraries, or other dependencies. This should not be used along side templates, as the artifacts are only retrieved at runtime.

artifact { 
    source = "https://example.com/app-config.json" 
    destination = "local/config.json" 
}

Avoid using hardcoded and static data.
Tip
Leverage service discovery to reference details about that service via templates.

Windows Clients

Workload Orchestration