Nomad version-specific upgrade guides

The upgrading page covers the details of doing a standard upgrade. However, specific versions of Nomad may have more details provided for their upgrades as a result of new features or changed behavior. This page is used to document those details separately from the standard upgrade flow.

Nomad 1.11.2

QEMU driver

The QEMU driver now uses host file paths for filesystem environment variables instead of relative container paths such as /alloc and /local. You may need to update job specs utilizing these variables to reflect the new values.

Nomad 1.11.1

Storage fingerprinting calculation changed

Nomad now calculates the storage available for scheduling using only `totalBytes

client.reserved.disk. The previous strategy using free disk space could lead to incorrect values when clients with running allocations restarted. Theunique.storage.bytesfree` attribute has also been removed. We recommend that you reserve at least the amount of disk that is used by the host OS.

QEMU driver

In Nomad 1.11.1, emulator and machine_type were added to the task config. These default to the previously used values of qemu-system-x86_64, and pc. Previously, when using the kvm accelerator, the machine type host was forced. This is no longer true, the value for machine_type will be used. Additionally, if using resources.cores, with the kvm accelerator, the -smp was hardcoded to that number of cores. This is now only done if the user has not specified a custom -smp flag.

Nomad 1.11.0

Sysbatch jobs will no longer accept `reschedule` blocks

In Nomad 1.11.0, submitting a sysbatch job with a reschedule block returns an error instead of being silently ignored, as it was in previous versions. The same behavior applies to system jobs.

Eval broker metrics for dispatch and periodic jobs

The leader records metrics for the eval broker. In Nomad 1.11.0 the job label on the nomad.nomad.broker.wait_time, nomad.nomad.broker.process_time, nomad.nomad.broker.response_time, and nomad.nomad.broker.eval_waiting metrics refers to the parent job ID for dispatch and periodic jobs. The nomad.nomad.broker.eval_waiting no longer has an eval_id label. For clusters running high volume dispatch workloads, this change significantly reduces metrics cardinality and memory usage on the leader.

ACL policies no longer silently ignore duplicate or invalid keys

Nomad 1.11.0 introduces stricter validation for ACL policies. Policy writes that include duplicate or invalid keys will be rejected with an error instead of being silently ignored. Any existing policies with duplicate or invalid keys will continue to work, but the source policy document will need to be updated to be valid before it can be written to Nomad.

Maximum number of allocations per job is limited by default

Nomad 1.11.0 limits the maximum number of allocations for a job to the value of the new job_max_count server configuration option, which defaults to 50000. The number of allocations is determined from the sum of the job's task group count fields. This limit is enforced at the time the job is submitted or scaled, and updating the value will not impact existing jobs.

Deprecated resource fields on Node API

The Resources and Reserved fields on the Go API's Node struct, as well as the equivalent fileds on the Read Node API, are deprecated. These fields are never populated. Use the NodeResources and ReservedResources fields instead.

Enterprise product usage reporting
Enterprise

Nomad Enterprise 1.11.0 adds detailed product usage information to automated license utilization reporting.

Nomad 1.10.6

ACL policies no longer silently ignore duplicate or invalid keys

Nomad 1.10.6 introduces stricter validation for ACL policies. Policy writes that include duplicate or invalid keys will be rejected with an error instead of being silently ignored. Any existing policies with duplicate or invalid keys will continue to work, but the source policy document will need to be updated to be valid before it can be written to Nomad.

Enterprise product usage reporting
Enterprise

Nomad Enterprise 1.10.6 adds detailed product usage information to automated license utilization reporting.

Nomad 1.10.2

Clients respect `telemetry.publish_allocation_metrics`

Nomad 1.10.2 fixed a bug where allocation metrics were collected and published even if the telemetry.publish_allocation_metrics configuration field was unset or set to false. If you are monitoring allocation metrics, you will need to ensure your Nomad clients set this field to true.

Nomad 1.10.1

Remove Raft peer by address removed

Nomad 1.4.0 removed support for Raft Protocol v2, and this removed the ability to remove Raft peers by address instead of peer ID. Nomad 1.10.1 removes the non-functional -peer-address option for the operator raft peer-remove command, and the address parameter for the DELETE /v1/operator/raft/peer API.

Agent exit on reloading configuration errors

Errors encountered when reloading agent configuration now cause agents to exit. In prior versions, Nomad only logged configuration errors during reloads. This could lead to agents running but unable to communicate. Any other errors when parsing the new configuration are logged and the reload is aborted, consistent with the current behavior.

Added Server `start_timeout` Configuration Option

Nomad 1.10.1 introduces a new server configuration option named start_timeout with a default value of 30s. This duration is used to monitor the server setup and startup processes which must complete before it is considered healthy, such as keyring decryption. If these processes do not complete before the timeout is reached, the server process will exit and any errors logged to the console.

Corrected `/v1/acl/token/self` response codes

Nomad 1.10.1 responds with different HTTP response codes to API calls sent to /v1/acl/token/self. For users that do not have ACLs enabled, the endpoint responds with 200 code and a response body that indicates that ACLs are disabled. Previously, the response code in such a scenario was 404.

For users that do have ACLs enabled and do not have a valid ACL token present, the endpoint responds with 403 code. Previously, the response code in such a scenario was 404.

Nomad 1.10.0

Quota specification variable_limits deprecated
Enterprise

The quota specification's variable_limits field is deprecated. We replaced it with a new storage block with a variables field, under the region_limit block. Existing quotas will be automatically migrated during server upgrade. We will remove the variables_limit field from the quota specification in Nomad 1.12.0.

Nomad 1.8 deprecated `disconnect` fields removed

In Nomad 1.8, we introduced the disconnect block to replace the max_client_disconnect, stop_after_client_disconnect, and prevent_reschedule_on_list fields. In Nomad 1.10, we removed these fields, and Nomad will ignore them if specified. Jobs should migrate to using the disconnect block prior to upgrading.

Go SDK API change for quota limits

In Nomad 1.10.0, the Go API for quotas has a breaking change. The QuotaSpec.RegionLimit field is now of type QuotaResources instead of Resources. The QuotaSpec.VariablesLimit field is deprecated in lieu of QuotaSpec.RegionLimit.Storage.Variables and will be removed in Nomad 1.12.0.

Remote task driver support removed

In Nomad 1.10.0, we removed all support for remote task driver capabilities. Nomad no longer detaches drivers with the RemoteTasks capability when an allocation is lost. Also, Nomad does not detach remote tasks when a node is drained. Workloads running as remote tasks should be migrated prior to upgrading.

Loading binaries from `plugin_dir` without configuration

Plugins stored within the plugin_dir will now only be loaded when they have a corresponding plugin block in the agent configuration file. Nomad now skips any plugin found without a corresponding configuration block.

Sentinel apply command requires scope
Enterprise

To prevent accidentally adding policies for volumes to the job scope, the nomad sentinel apply command now requires the -scope option. Refer to the GitHub pull request for details.

Affinity and spread updates are non-destructive

We fixed a scheduler bug so that updates to affinity and spread blocks are no longer destructive. After a job update that changes only these blocks, existing allocations remain running with their job version incremented. If you were relying on the previous behavior to redistribute workloads, you can force a destructive update by changing fields that require one, such as the meta block.

Vault and Consul integration changes

Nomad 1.10.0 removes the previously deprecated token-based authentication workflow for Vault and Consul. Nomad clients must now use a task's workload identity to authenticate to Vault and Consul and obtain a token specific to the task.

This table lists removed Vault fields and the new workflow.

Field	Configuration	New Workflow
`vault.allow_unauthenticated`	Agent	Tasks should use a workload identity. Do not use a Vault token.
`vault.task_token_ttl`	Agent	With workload identity, tasks receive their TTL configuration from the Vault role.
`vault.token`	Agent	Nomad agents use the workload identity when making requests to authenticated endpoints.
`vault.policies`	Job specification	Configure and use a Vault role.

Before upgrading to Nomad 1.10, perform the following tasks:

Configure Vault and Consul to work with workload identity.
Migrate all workloads to use workload identity.

Refer to the following guides for more information:

Nomad 1.9.9

Added Server `start_timeout` Configuration Option

Nomad 1.9.9 introduces a new server configuration option named start_timeout with a default value of 30s. This duration is used to monitor the server setup and startup processes which must complete before it is considered healthy, such as keyring decryption. If these processes do not complete before the timeout is reached, the server process will exit and any errors logged to the console.

Nomad 1.9.5

CNI plugins

Nomad 1.9.5 includes a bug fix for restoring allocation networking after a client host reboot. This fix requires recent versions of the CNI reference plugins (minimum 1.2.0) and will fallback to the existing behavior if the CNI reference plugins cannot support the fix.

We recommend installing the CNI reference plugins from the CNI project release page rather than your Linux distribution's package manager.

Nomad 1.9.4

Security updates to default deny lists

In Nomad 1.9.4, the default function_denylist includes executeTemplate, as a measure to prevent accidental or malicious infinitely recursive execution. Users that require executeTemplate should update their configuration.

Additionally, the default client env deny list includes more environment variables. Users who need some of these secure environment variables passed to their tasks should consult the list and overwrite it in the configuration.

Nomad 1.9.3

In Nomad 1.9.3, the mechanism used for calculating when objects are eligible for garbage collection changes to a clock-based one. This has two consequences. First, it allows to set arbitrarily long GC intervals. Second, it requires that Nomad servers are kept roughly in sync time-wise, because GC can originate in a follower.

Nomad 1.9.2 contained a bug that could drop all cluster state on upgrade and has been removed from downloads.

Nomad 1.9.0

Dropped support for older clients

Nomad 1.9.0 removes support for Nomad client agents older than 1.6.0. Older nodes fail heartbeats. Nomad servers mark the workloads on those nodes as lost and reschedule them normally according to the job's [reschedule][] block.

Keyring In Raft

Nomad 1.9.0 stores keys used for signing Workload Identity and encrypting Variables in Raft, instead of storing key material in the external keystore. When using external KMS or Vault transit encryption for the keyring provider, the key encryption key (KEK) is stored outside of Nomad and no cleartext key material exists on disk. When using the default AEAD provider, the key encryption key (KEK) is stored in Raft alongside the encrypted data encryption keys (DEK).

Nomad automatically migrates the key storage for all key material on the first root_key_gc_interval after all servers are upgraded to 1.9.0. The existing on-disk keystore is required to restore servers from older snapshots, so you should continue to back up the on-disk keystore until you no longer need those older snapshots.

Support for HCLv1 removed

Nomad 1.9.0 no longer supports the HCLv1 format for job specifications. Using the -hcl1 option for the job run, job plan, and job validate commands will no longer work.

One common use of -hcl1 was when specifying Docker labels with dots in their keys such as for DataDog autodiscovery:

labels {
  "com.datadoghq.ad.check_names"  = "[\"openmetrics\"]"
  "com.datadoghq.ad.init_configs" = "[{}]"
  # ...
}

Quoted keys are invalid in HCLv2 blocks and must be specified with a list-of-maps syntax:

labels = [
  {
    "com.datadoghq.ad.check_names"  = "[\"openmetrics\"]"
    "com.datadoghq.ad.init_configs" = "[{}]"
    # ...
  }
]

Nomad 1.8.18

ACL policies no longer silently ignore duplicate or invalid keys

Nomad 1.8.18 introduces stricter validation for ACL policies. Policy writes that include duplicate or invalid keys will be rejected with an error instead of being silently ignored. Any existing policies with duplicate or invalid keys will continue to work, but the source policy document will need to be updated to be valid before it can be written to Nomad.

Enterprise product usage reporting
Enterprise

Nomad Enterprise 1.8.18 adds detailed product usage information to automated license utilization reporting.

Nomad 1.8.4

Default Docker `infra_image` changed

Due to the deprecation of the third-party gcr.io registry, the default Docker infra_image is now registry.k8s.io/pause-<arch>:3.3. If you do not override the default, clients using the docker driver will make outbound requests to the new registry.

Nomad 1.8.3

Nomad keyring rotation

In Nomad 1.8.3, the Nomad root keyring will prepublish keys at half the root_key_rotation_threshold and promote them to active once the root_key_rotation_threshold has passed. The nomad operator root keyring rotate command now requires one of two arguments: -prepublish <duration> to prepublish a key or -now to rotate immediately. We recommend using -prepublish to avoid outages from workload identities used to log into external services such as Vault or Consul.

Nomad 1.8.2

New `windows_allow_insecure_container_admin` configuration option for Docker driver

In 1.8.2, Nomad will refuse to run jobs that use the Docker driver on Windows with Process Isolation that run as ContainerAdmin. This is in order to provide a more secure environment for these jobs, and this behavior can be overridden by setting the new windows_allow_insecure_container_admin Docker plugin configuration option to true or by setting privileged=true. We made this change as a result of regressions introduced by mitigations for HCSEC-2024-03.

New default isolation mode for Docker on Windows

Nomad 1.8.2 changes the default isolation mode for Docker tasks on Windows from process to hyperv, since hyperv provides a much more secure execution environment. We made this change as a result of regressions introduced by mitigations for HCSEC-2024-03.

Nomad 1.8.1

Enterprise

Nomad Enterprise 1.8.1 includes an updated version of the Sentinel library. Users that have built custom Sentinel plugins must recompile them using an SDK supporting Sentinel Plugin Protocol Version 3. Consult the Sentinel SDK Compatibility Matrix for appropriate Sentinel SDK versions.

Nomad 1.8.0

Deprecated Disconnect Fields

Nomad 1.8.0 introduces a disconnect block meant to group all the configuration options related to disconnected client's and server's behavior, causing the deprecation of the fields stop_after_client_disconnect, max_client_disconnect and prevent_reschedule_on_lost. This block also introduces new options for allocations reconciliation if the client regains connectivity.

CNI Constraints

In Nomad 1.8.0, jobs with bridge networking will have constraints added during job submit that require CNI plugins to be present on the node. Nodes have fingerprinted the available CNI plugins starting in Nomad 1.5.0.

If you are upgrading from Nomad 1.5.0 or later to 1.8.0 or later, there's nothing additional for you to do. It's not recommended to skip more than 2 versions of Nomad. But if you upgrade from earlier than 1.5.0 to 1.8.0 or later, you will need to ensure that clients have been upgraded before submitting any jobs that use bridge networking.

Removal of `raw_exec` option `no_cgroups`

In Nomad 1.7.0 the raw_exec plugin option for no_cgroups became ineffective. Starting in Nomad 1.8.0 attempting to set the no_cgroups in raw_exec plugin configuration will result in an error when starting the agent.

Nomad 1.7.11

Enterprise

Nomad keyring rotation

In Nomad 1.7.11, the Nomad root keyring will prepublish keys at half the root_key_rotation_threshold and promote them to active once the root_key_rotation_threshold has passed. The nomad operator root keyring rotate command now requires one of two arguments: -prepublish <duration> to prepublish a key or -now to rotate immediately. We recommend using -prepublish to avoid outages from workload identities used to log into external services such as Vault or Consul.

Nomad 1.7.10

Enterprise

New `windows_allow_insecure_container_admin` configuration option for Docker driver

In 1.7.10, Nomad will refuse to run jobs that use the Docker driver on Windows with Process Isolation that run as ContainerAdmin. This is in order to provide a more secure environment for these jobs, and this behavior can be overridden by setting the new windows_allow_insecure_container_admin Docker plugin configuration option to true or by setting privileged=true.

New default isolation mode for Docker on Windows

Nomad 1.7.10 changes the default isolation mode for Docker tasks on Windows from process to hyperv, since hyperv provides a much more secure execution environment.

Nomad 1.7.2

Nomad 1.7.2 fixes a critical bug in CPU fingerprinting in Nomad 1.7.0 and 1.7.1. You should not install Nomad 1.7.0 or 1.7.1 and instead install the latest Nomad 1.7.x version.

Nomad 1.7.0

Warning

Nomad 1.7.0 contains a critical bug in keyring replication. You should not install Nomad 1.7.0 and instead install the latest Nomad 1.7.x version.

Keyring Replication Failure After Leader Election

Nomad 1.7.0 introduced new RSA keys to the keyring for use in signing workload identities. These keys were not correctly replicated from leader to followers. This results in all workload identity verification failing after a leader election.

This bug was fixed in Nomad 1.7.1.

Vault Integration Changes

Starting in Nomad 1.7, Nomad clients will use a task's [Workload Identity][] to authenticate to Vault and obtain a Vault token specific to the task.

The existing workflow using a Vault token provided in either the agent configuration or at the time of job submission is deprecated and will be removed in Nomad 1.10. The vault.policies field is also deprecated and will work only with the existing workflow. Instead, you should configure a suitable Vault role and use that.

The following agent configuration fields are deprecated:

vault.allow_unauthenticated will be removed in Nomad 1.10. Tasks will use the workload identity without the user supplying a Vault token.
vault.task_token_ttl will be removed in Nomad 1.10. With workload identity, tasks will receive their TTL configuration from the Vault role.
vault.token will be removed in Nomad 1.10. Nomad agents will no longer make requests to authenticated endpoints except with a task's workload identity.

Before upgrading to Nomad 1.10 you will need to have configured authentication with Vault to work with workload identity. Refer to Migrating to Using Workload Identity with Vault for more details.

Consul Integration Changes

Starting in Nomad 1.7, Nomad clients will use a service's or task's Workload Identity to authenticate to Consul and obtain a Consul token specific to the workload.

The existing workflow using a Consul token provided in either the agent configuration or at the time of job submission is deprecated and will be removed in Nomad 1.10. The consul.allow_unauthenticated agent configuration field will be removed in Nomad 1.10. Tasks will use the workload identity without the user supplying a Consul token.

Before upgrading to Nomad 1.10 you will need to have configured authentication with Consul to work with workload identity. See Migrating to Using Workload Identity with Consul for more details.

RS256 JWT Signing Algorithm Support

Prior to Nomad 1.7, workload identity JWTs were signed with the EdDSA algorithm. While EdDSA has numerous advantages as a signing algorithm, most third parties that accept JWTs expect the RS256 signing algorithm to be used.

Therefore starting in Nomad 1.7 new signing keys will generate an RSA key and sign workload identities with the RS256 signing algorithm.

Before setting up third party authentication methods to use workload identities, it is recommended to run nomad operator root keyring rotate to ensure you generate a new RSA key.

To verify an RSA key is present you may check the /.well-known/jwks.json endpoint on any Nomad agent. If you see "kty": "RSA", then an RSA key exists and you do not need to rotate keys.

New Nomad clusters will use RSA by default and are not affected.

CPU Fingerprinting Changes

Starting in Nomad 1.7, Nomad clients improve the accuracy of detected CPU performance metrics. The fingerprinter now takes into account heterogeneous core types on applicable processors. In addition, Nomad will attempt to detect and use the base frequency of the processor rather than the turbo frequency when calculating the total available CPU bandwidth. The net result of these behaviors is that the calculated total CPU bandwidth available on a node may change when upgrading to Nomad 1.7. Operators are encouraged to ensure planned capacity meets expectations before upgrading. The [cpu concepts][cpu] documentation contains guidance in understanding how Nomad detects CPU metrics.

CPU EC2 Detection Changes

Prior to Nomad 1.7, Nomad clients embedded a large lookup table of CPU performance data for every EC2 instance type. In 1.7 and later Nomad instead gathers this data by executing the dmidecode command. The dmidecode package must be installed manually on some Linux distributions before the Nomad agent is started.

CPU Core Isolation

Starting in Nomad 1.7, Nomad tasks that specify CPU resources using the cores attribute will be restricted to using only the CPU cores assigned to them. In previous versions of Nomad these tasks could also make use of other non-reserved CPU cores. However this feature would cause severe performance problems for the Linux kernel as the number of tasks increased. Operators are encouraged to ensure tasks making use of the cores attribute are given sufficient CPU resources before upgrading.

The `distinct_hosts` Constraint Now Honors Namespaces

Nomad 1.7.0 changes the behavior of the distinct_hosts constraint such that namespaces are taken into account when choosing feasible clients for allocation placement. The previous, less-expected behavior would cause any job with the same name running on a client to cause that node to be considered infeasible.

This change allows workloads that formerly did not colocate to be scheduled onto the same client when they are in different namespaces. To prevent this, consider using [node pools] and constrain the jobs with a distinct_property constraint over ${node.pool}.

Loading Binaries from `plugin_dir` Without Configuration

Starting with Nomad 1.7.0, loading plugins that are not referenced in the agent configuration file is deprecated. Future versions of Nomad will only load plugins that have a corresponding plugin block in the agent configuration file.

Changes to `raw_exec`

The raw_exec task driver now enforces memory limits via cgroups on Linux platforms similar to the exec and docker task drivers. The driver does support memory oversubscription, which can be configured in such a way to nearly replicate the previously unlimited behavior.

The no_cgroups configuration option no longer has any effect. Previously, setting no_cgroups would disable the mechanism where Nomad used the freezer cgroup to halt the process group of a Task before issuing a kill signal to each process. Starting in Nomad 1.7.0 this behavior is always enabled (and a similar mechanism has always been enabled on cgroups v2 systems).

Nomad 1.6.14

Enterprise

Nomad keyring rotation

In Nomad 1.6.14, the Nomad root keyring will prepublish keys at half the root_key_rotation_threshold and promote them to active once the root_key_rotation_threshold has passed. The nomad operator root keyring rotate command now requires one of two arguments: -prepublish <duration> to prepublish a key or -now to rotate immediately. We recommend using -prepublish to avoid outages from workload identities used to log into external services such as Vault or Consul.

Nomad 1.6.13

Enterprise

New `windows_allow_insecure_container_admin` configuration option for Docker driver

In 1.6.13, Nomad will refuse to run jobs that use the Docker driver on Windows with Process Isolation that run as ContainerAdmin. This is in order to provide a more secure environment for these jobs, and this behavior can be overridden by setting the new windows_allow_insecure_container_admin Docker plugin configuration option to true or by setting privileged=true.

New default isolation mode for Docker on Windows

Nomad 1.6.13 changes the default isolation mode for Docker tasks on Windows from process to hyperv, since hyperv provides a much more secure execution environment.

Nomad 1.6.0

Enterprise License Validation with BuildDate

Nomad Enterprise 1.6.0 now compares license ExpirationTime with the Nomad binary's BuildDate, rather than comparing the sometimes more lenient license TerminationTime with time.Now(). See the licensing FAQ for more info, but most relevant here is that you should run the new nomad license inspect command before trying to upgrade your Enterprise servers to v1.6.0 or higher.

Job Evaluate API Endpoint Requires `submit-job` Instead of `read-job`

Nomad 1.6.0 updated the ACL capability requirement for the job evaluate endpoint from read-job to submit-job to better reflect that this operation writes state to Nomad. This endpoint is used by the nomad job eval CLI command and so the ACL requirements changed for the command as well. Users that called this endpoint or used this command using tokens with just the read-job capability or the read policy must update their tokens to use the submit-job capability or the write policy.

Exec Driver Requires New Capability for mlock

Nomad 1.6.0 updated the exec task driver to maintain the max memory locked limit set by the host system. In earlier versions of Nomad this limit was unset unintentionally.

In practice this means that exec tasks such as Vault which use the mlock system call will now need to explicitly add the ipc_lock capability.

First allow the ipc_lock capability in the Client configuration:

plugin "exec" {
  config {
    allow_caps = ["audit_write", "chown", "dac_override", "fowner", "fsetid",
      "kill", "mknod", "net_bind_service", "setfcap", "setgid", "setpcap",
      "setuid", "sys_chroot", "ipc_lock"]
  }
}

Then add the ipc_lock capability to the exec task that uses mlock:

task "vault" {
  driver = "exec"

  config {
    cap_add = ["ipc_lock"]

    # ... other task configuration
  }

# ... rest of jobspec

These additions are backward compatible with Nomad v1.5, so Clients and Jobs should be updated prior to upgrading to Nomad v1.6.

See #17780 for details.

Namespace ACL policies require a label

Nomad 1.6.0 does not allow ACL policies for namespaces without a label. Prior to this version, ACL policies for namespaces were allowed to be defined without a label, and the documented behavior in this case was that the policy would be applied to the default namespace.

A bug in this logic caused the policy to be incorrectly applied to a different namespace. For example, the policy below would be applied to a namespace called policy instead of default.

namespace {
  policy = "read"
}

To avoid further confusion and potential security incidents, this functionality was removed and now all namespace policies are required to have a label.

Tokens currently attached to an invalid policy will stop working after the upgrade, so you should fix invalid policies to have an explicit namespace label before upgrading Nomad.

After the policies are fixed, the existing tokens with those policies will continue to work and do not need to be regenerated.

Command `nomad tls cert create` flag `-cluster-region` deprecated

Nomad 1.6.0 will deprecate the command nomad tls cert create flag -cluster-region in favour of using the standard flag -region. The -cluster-region flag will be removed in Nomad 1.7.0

32-bit Intel Builds Deprecated

Starting with Nomad 1.6.0, HashiCorp will no longer release 32-bit Intel builds of Nomad and Nomad Enterprise (the builds named windows_386 and linux_386). Bug fixes will continue to be backported to the 1.5.x and 1.4.x versions so long as those major versions are still supported.

The 32-bit ARM build (linux_arm for the armhf architecture) is deprecated and may be removed in a future major version of Nomad. The 32-bit ARM build is not tested and may include bugs around platform-specific integer sizes. Using 64-bit builds for small form-factor hosts such as the RaspberryPi is strongly recommended.

Nomad 1.5.7, 1.4.11

Namespace ACL policies require a label

Nomad 1.5.7 and 1.4.11 do not allow ACL policies for namespaces without a label. Prior to these versions, ACL policies for namespaces were allowed to be defined without a label, and the documented behavior in this case was that the policy would be applied to the default namespace.

A bug in this logic caused the policy to be incorrectly applied to a different namespace. For example, the policy below would be applied to a namespace called policy instead of default.

namespace {
  policy = "read"
}

To avoid further confusion and potential security incidents, this functionality was removed and now all namespace policies are required to have a label.

Tokens currently attached to an invalid policy will stop working after the upgrade, so you should fix invalid policies to have an explicit namespace label before upgrading Nomad.

After the policies are fixed, the existing tokens with those policies will continue to work and do not need to be regenerated.

Nomad 1.5.5

Nomad 1.5.5 fixed a bug where allocations that are rescheduled for jobs registered before the upgrade would no longer collect allocation logs. The logs.enabled field introduced in 1.5.4 is now deprecated and has been replaced by a logs.disabled field that defaults to false. The logs.enabled field value will be ignored in 1.5.5 and will be removed in Nomad 1.6.0.

Nomad 1.5.4

Nomad 1.5.4 included a bug where allocations that are rescheduled for jobs registered before the upgrade would no longer collect allocation logs. The client will emit debug-level logs like the following:

client.alloc_runner.task_runner.task_hook: log collection is disabled by task

You should avoid this version of Nomad and instead install the latest version of Nomad 1.5. If you have already upgraded to Nomad 1.5.4, upgrading to Nomad 1.5.5 will restore logging collection when clients are restarted as part of the upgrade process.

Nomad 1.5.1

Artifact Download Regression Fix

Nomad 1.5.1 reverts a behavior of 1.5.0 where artifact downloads were executed as the nobody user on compatible Linux systems. This was done optimistically as defense against compromised artifact endpoints attempting to exploit the Nomad Client or tools it uses to perform downloads such as git or mercurial. Unfortunately running the child process as any user other than root is not compatible with the advice given in Nomad's security hardening guide which calls for a specific directory tree structure making such operation impossible.

Other changes to artifact downloading remain - they are executed as a child process of the Nomad agent, and on modern Linux systems make use of the Kernel landlock feature to restrict filesystem access from that process.

Nomad 1.5.0

Pause Container Reconciliation Regression

Nomad 1.5.0 introduced a regression to the way the Docker driver reconciles dangling containers. This meant pause containers would be erroneously removed, even though the allocation was still running. This would not affect the running allocation, but does cause it to fail if it needs to restart. An immediate workaround is to disable dangling container reconciliation.

Artifact Download Sandboxing

Nomad 1.5.0 changes the way [artifacts] are downloaded when specifying an artifact in a task configuration. Previously the Nomad Client would download artifacts in-process. External commands used to facilitate the download (e.g. git, hg) would be run as root, and the resulting payload would be owned as root in the allocation's task directory.

In an effort to improve the resilience and security model of the Nomad Client, in 1.5.0 artifact downloads occur in a sub-process. Where possible, that sub-process is run as the nobody user, and on modern Linux systems will be isolated from the filesystem via the kernel's landlock capability.

Operators are encouraged to ensure jobs making use of artifacts continue to work as expected. In particular, git-ssh users will need to make sure the system-wide /etc/ssh/ssh_known_hosts file is populated with any necessary remote hosts. Previously, Nomad's documentation suggested configuring /root/.ssh/known_hosts which would apply only to the root user.

The artifact downloader no longer inherits all environment variables available to the Nomad Client. The downloader sub-process environment is set as follows on Linux / macOS:

PATH=/usr/local/bin:/usr/bin:/bin
TMPDIR=<path to task dir>/tmp

and as follows on Windows:

TMP=<path to task dir>\tmp
TEMP=<path to task dir>\tmp
PATH=<inherit $PATH>
HOMEPATH=<inherit $HOMEPATH>
HOMEDRIVE=<inherit $HOMEDRIVE>
USERPROFILE=<inherit $USERPROFILE>

Configuration of the artifact downloader should happen through the options and headers fields of the artifact block. For backwards compatibility, the sandbox can be configured to inherit specified environment variables from the Nomad client by setting set_environment_variables.

The use of filesystem isolation can be disabled in Client configuration by setting disable_filesystem_isolation.

Artifact Decompression Limits

Nomad 1.5.0 now sets default limits around artifact decompression. A single artifact payload is now limited to 100GB and 4096 files when decompressed. An artifact that exceeds these limits during decompression will cause the artifact downloader to fail. These limits can be adjusted or disabled in the client artifact configuration by setting decompression_size_limit and decompression_file_count_limit.

Datacenter Wildcards

In Nomad 1.5.0, the datacenters field for a job accepts wildcards for multi-character matching. For example, datacenters = ["dc*"] will match all datacenters that start with "dc". The default value for datacenters is now ["*"], so the field can be omitted.

The * character is no longer a legal character in the datacenter field for an agent configuration. Before upgrading to Nomad 1.5.0, you should first ensure that you've updated any jobs that currently have a * in their datacenter name and then ensure that no agents have this character in their datacenter field name.

Server `rejoin_after_leave` (default: `false`) now enforced

All Nomad versions prior to v1.5.0 have incorrectly ignored the Server rejoin_after_leave configuration option. This bug has been fixed in Nomad version v1.5.0.

Previous to v1.5.0 the behavior of Nomad rejoin_after_leave was always true, regardless of Nomad server configuration, while the documentation incorrectly indicated a default of false.

Cluster operators should be aware that explicit leave events (such as nomad server force-leave) will now result in behavior which matches this configuration, and should review whether they were inadvertently relying on the buggy behavior.

Changes to eval broker metrics

The metric nomad.nomad.broker.total_blocked has been changed to nomad.nomad.broker.total_pending. This state refers to internal state of the leader's broker, and this is easily confused with the unrelated evaluation status "blocked" in the Nomad API.

Deprecated gossip keyring commands removed

The commands nomad operator keyring, nomad keyring, nomad operator keygen, and nomad keygen used to manage the gossip keyring were marked as deprecated in Nomad 1.4.0. In Nomad 1.5.0, these commands have been removed. Use the nomad operator gossip keyring commands to manage the gossip keyring.

Garbage collection of evaluations and allocations for batch job

Versions prior to 1.5.0 only delete evaluations and allocations of batch jobs that are explicitly stopped which can lead to unbounded memory growth of Nomad when the batch job is executed multiple times.

Nomad 1.5.0 introduces a new server configuration batch_eval_gc_threshold to control how allocations and evaluations for batch jobs are collected.

The default threshold is 24h. If you need to access completed allocations for batch jobs that are older than 24h you must increase this value when upgrading Nomad.

Nomad 1.4.5, 1.3.10

Pause Container Reconciliation Regression

Nomad 1.4.5 and 1.3.10 introduced a regression to the way the Docker driver reconciles dangling containers. This meant pause containers would be erroneously removed, even though the allocation was still running. This would not affect the running allocation, but does cause it to fail if it needs to restart. An immediate workaround is to disable dangling container reconciliation.

Nomad 1.4.4, 1.3.9

Garbage collection of evaluations and allocations for batch job

Versions prior to 1.4.4 and 1.3.9 only delete evaluations and allocations of batch jobs that are explicitly stopped which can lead to unbounded memory growth of Nomad when the batch job is executed multiple times.

Nomad 1.4.4 and 1.3.9 introduces a new server configuration batch_eval_gc_threshold to control how allocations and evaluations for batch jobs are collected.

The default threshold is 24h. If you need to access completed allocations for batch jobs that are older than 24h you must increase this value when upgrading Nomad.

Nomad 1.4.0

Possible Panic During Upgrades

Nomad 1.4.0 initializes a keyring on the leader if one has not been previously created, which writes a new raft entry. Users have reported that the keyring initialization can cause a panic on older servers during upgrades. Following the documented upgrade process closely will reduce the risk of this panic. But if a server with version 1.4.0 becomes leader while servers with versions before 1.4.0 are still in the cluster, the older servers will panic.

The most likely scenario for this is if the leader is still on a version before 1.4.0 and is netsplit from the rest of the cluster or the server is restarted without upgrading, and one of the 1.4.0 servers becomes the leader.

You can recover from the panic by immediately upgrading the old servers. This bug was fixed in Nomad 1.4.1.

Raft Protocol Version 2 Unsupported

Raft protocol version 2 was deprecated in Nomad v1.3.0, and is being removed in Nomad v1.4.0. In Nomad 1.3.0, the default raft protocol version was updated to version 3, and in Nomad 1.4.0 Nomad requires the use of raft protocol version 3. If raft_protocol version is explicitly set, it must now be set to 3. For more information refer to the Upgrading to Raft Protocol 3 guide.

Audit logs filtering logic changed

Audit Log filtering in previous versions of Nomad handled stages and operations filters as OR filters. If either condition was met, the logs would be filtered. As of 1.4.0, stages and operations are treated as AND filters. Logs will only be filtered if all filter conditions match.

Prevent Overlapping New Allocations with Stopping Allocations

Prior to Nomad 1.4.0 the scheduler would consider the resources used by allocations that are in the process of stopping to be free for new allocations to use. This could cause newer allocations to crash when they try to use TCP ports or memory used by an allocation in the process of stopping. The new and stopping allocations would "overlap" improperly.

Nomad 1.4.0 fixes this behavior so that an allocation's resources are only considered free for reuse once the client node the allocation was running on reports it has stopped. Technically speaking: only once the Allocation.ClientStatus has reached a terminal state (complete, failed, or lost).

Despite this being a bug fix, it is considered a significant enough change in behavior to reserve for a major Nomad release and not be backported. Please report any negative side effects encountered as new issues.

`nomad eval status -json` Without Evaluation ID Removed

Using nomad eval status -json without providing an evaluation ID was deprecated in Nomad 1.2.4 with the intent to remove in Nomad 1.4.0. This option has been removed. You can use nomad eval list to get a list of evaluations and can use nomad eval list -json to get that list in JSON format. The nomad eval status <eval ID> command will format a specific evaluation in JSON format if the -json flag is provided.

Removing Vault/Consul from Clients

Nomad clients no longer have their Consul and Vault fingerprints cleared when connectivity is lost with Consul and Vault. To intentionally remove Consul and Vault from a client node, you will need to restart the Nomad client agent.

Numeric Operand Comparisons in Constraints

Prior to Nomad 1.4.0 the <, <=, >, >= operators in a constraint would always compare the operands lexically. This behavior has been changed so that the comparison is done numerically if both operands are integers or floats.

Nomad 1.3.3

Environments that don't support the use of uid and gid in template blocks, such as Windows clients, may experience task failures with the following message after upgrading to Nomad 1.3.3:

Template failed: error rendering "(dynamic)" => "...": failed looking up user: managing file ownership is not supported on Windows

It is recommended to avoid this version of Nomad in such environments.

Nomad 1.3.2, 1.2.9, 1.1.15

Client `max_kill_timeout` now enforced

Nomad versions since v0.9 have incorrectly ignored the Client max_kill_timeout configuration option. This bug has been fixed in Nomad versions v.1.3.2, v1.2.9, and v1.1.15. Job submitters should be aware that a Task's kill_timeout will be reduced to the Client's max_kill_timeout if the value exceeds the maximum.

Nomad 1.3.1, 1.2.8, 1.1.14

Default `artifact` limits

Nomad 1.3.1, 1.2.8, and 1.1.14 introduced mechanisms to limit the size of artifact downloads and how long these operations can take. The limits are defined in the new artifact client configuration and have predefined default values.

While the defaults set are fairly large, it is recommended to double-check them prior to upgrading your Nomad clients to make sure they fit your needs.

Nomad 1.3.0

Raft Protocol Version 2 Deprecation

Raft protocol version 2 will be removed from Nomad in the next major release of Nomad, 1.4.0.

In Nomad 1.3.0, the default raft protocol version has been updated to 3. If the raft_protocol version is not explicitly set, upgrading a server will automatically upgrade that server's raft protocol. Refer to the Upgrading to Raft Protocol 3 guide.

Client State Store

The client state store will be automatically migrated to a new schema version when upgrading a client.

Downgrading to a previous version of the client after upgrading it to Nomad 1.3 is not supported. To downgrade safely, users should drain all tasks from the Nomad client and erase its data directory.

CSI Plugins

The client filesystem layout for CSI plugins has been updated to correctly handle the lifecycle of multiple allocations serving the same plugin. Running plugin tasks will not be updated after upgrading the client, but it is recommended to redeploy CSI plugin jobs after upgrading the cluster.

The directory for plugin control sockets will be mounted from a new per-allocation directory in the client data dir. This will still be bind-mounted to csi_plugin.mount_config as in versions of Nomad prior to 1.3.0.

The volume staging directory for new CSI plugin tasks will now be mounted to the task's NOMAD_TASK_DIR instead of the csi_plugin.mount_config.

Raft leadership transfer on error

Starting with Nomad 1.3.0, when a Nomad server is elected the Raft leader but fails to complete the process to start acting as the Nomad leader it will attempt to gracefully transfer its Raft leadership status to another eligible server in the cluster. This operation is only supported when using Raft Protocol Version 3.

Server Raft Database

The server raft database in raft.db will be automatically migrated to a new underlying implementation provided by go.etcd.io/bbolt. Downgrading to a previous version of the server after upgrading it to Nomad 1.3 is not supported. Like with any Nomad upgrade it is recommended to take a snapshot of your database prior to upgrading in case a downgrade becomes necessary.

The new database implementation enables a new server configuration option for controlling the underlying freelist-sync behavior. Clusters experiencing extreme disk IO on servers may want to consider disabling freelist-sync to reduce load. The tradeoff is longer server startup times, as the database must be completely scanned to re-build the freelist from scratch.

server {
  raft_boltdb {
    no_freelist_sync = true
  }
}

Changes to the `nomad server members` command

The standard output of the nomad server members command replaces the previous Protocol column that indicated the Serf protocol version with a new column named Raft Version which outputs the Raft protocol version defined in each server.

The -detailed flag is now called -verbose and outputs the standard values in addition to extra information. The previous name is still supported but may be removed in future releases.

The previous Protocol value can be viewed using the -verbose flag.

Changes to `client.template.function_denylist` configuration

consul-template v0.28 added a new function writeToFile which can write to arbitrary files on the host.

Nomad 1.3.0 disables this function by default in its function_denylist.

However if you have overridden the default template.function_denylist in your client configuration, you must add writeToFile to your denylist. Failing to do so allows templates to write to arbitrary paths on the host.

Changes to Envoy metrics labels

When using Envoy as a sidecar proxy for Connect enabled services, Nomad will now automatically inject the unique allocation ID into Envoy's stats tags configuration. Users who wish to set the tag values themselves may do so using the proxy.config block.

connect {
  sidecar_service {
    proxy {
      config {
        envoy_stats_tags = ["nomad.alloc_id=<allocID>"]
      }
    }
  }
}

Changes to Consul Connect Service Identity Tokens

Starting with Nomad 1.3.0, Consul Service Identity Tokens created automatically by Nomad on behalf of Connect services will now be created as local tokens. These tokens will no longer be replicated globally. To facilitate cross-Consul datacenter requests of Connect services registered by Nomad, Consul agents will need to be configured with default anonymous ACL tokens with ACL policies of sufficient permissions to read service and node metadata pertaining to those requests. This mechanism is described in Consul issue 7414 A typical Consul agent anonymous token may contain an ACL policy such as:

service_prefix "" { policy = "read" }
node_prefix    "" { policy = "read" }

The minimum version of Consul supported by Nomad's Connect integration is now Consul v1.8.0.

Changes to task groups that utilise Consul services and checks

Starting with Nomad 1.3.0, services and checks that utilise Consul will have an automatic constraint placed upon the task group. This ensures they are placed on a client with a Consul agent running that meets a minimum version requirement. The minimum version of Consul supported by Nomad's service and check blocks is now Consul v1.7.0.

Linux Control Groups Version 2

Starting with Nomad 1.3.0, Linux systems configured to use cgroups v2 are now supported. A Nomad client will only activate its v2 control groups manager if the system is configured with the cgroups2 controller mounted at /sys/fs/cgroup.

Systems that do not support cgroups v2 are not affected.
Systems configured in hybrid mode typically mount the cgroups2 controller at /sys/fs/cgroup/unified, so Nomad will continue to use cgroups v1 for these hosts.
Systems configured with only cgroups v2 now correctly support setting cpu [cores].

Nomad will preserve the existing cgroup for tasks when a client is upgraded, so there will be no disruption to tasks. A new client attribute unique.cgroup.version indicates which version of control groups Nomad is using.

When cgroups v2 are in use, Nomad uses nomad.slice as the default parent for cgroups created on behalf of tasks. The cgroup created for a task is named in the form <allocID>.<task>.scope. These cgroups are created by Nomad before a task starts. External task drivers that support containerization should be updated to make use of the new cgroup locations.

The new cgroup file system layout will look like the following:

➜ tree -d /sys/fs/cgroup/nomad.slice
/sys/fs/cgroup/nomad.slice
├── 8b8da4cf-8ebf-b578-0bcf-77190749abf3.redis.scope
└── a8c8e495-83c8-311b-4657-e6e3127e98bc.example.scope

Support for pre-0.9 Tasks Removed

Running tasks that were created on clusters from Nomad version 0.9 or earlier will fail to restore after upgrading a cluster to Nomad 1.3.0. To safely upgrade without unplanned interruptions, force these tasks to be rescheduled by nomad alloc stop before upgrading. Note this only applies to tasks that have been running continuously from before 0.9 without rescheduling. Jobs that were created before 0.9 but have had tasks replaced over time after 0.9 will operate normally during the upgrade.

Nomad 1.2.6, 1.1.12, and 1.0.18

ACL requirement for the job parse endpoint

Nomad 1.2.6, 1.1.12, and 1.0.18 require ACL authentication for the job parse API endpoint. The parse-job capability has been created to allow access to this endpoint. The submit-job, read, and write policies include this capability.

The capability must be enabled for the namespace used in the API request.

Nomad 1.2.4

`nomad eval status -json` deprecated

Nomad 1.2.4 includes a new nomad eval list command that has the option to display the results in JSON format with the -json flag. This replaces the existing nomad eval status -json option. In Nomad 1.4.0, nomad eval status -json will be changed to display only the selected evaluation in JSON format.

Nomad 1.2.2

Panic on node class filtering for system and sysbatch jobs fixed

Nomad 1.2.2 fixes a server crashing bug present in the scheduler node class filtering since 1.2.0. Users should upgrade to Nomad 1.2.2 to avoid this problem.

Nomad 1.2.0

Nvidia device plugin

The Nvidia device is now an external plugin and must be installed separately. Refer to [the Nvidia device plugin's documentation][nvidia] for details.

ACL requirements for accessing the job details page in the Nomad UI

Nomad 1.2.0 introduced a new UI component to display the status of system and sysbatch jobs in each client where they are running. This feature makes an API call to an endpoint that requires node:read ACL permission. Tokens used to access the Nomad UI will need to be updated to include this permission in order to access a job details page.

This was an unintended change fixed in Nomad 1.2.4.

HCLv2 Job Specification Parsing

In previous versions of Nomad, when rendering a job specification using override variables, a warning would be returned if a variable within an override file was declared that was not found within the job specification. This behaviour differed from passing variables via the -var flag, which would always cause an error in the same situation.

Nomad 1.2.0 fixed the behaviour consistency to always return an error by default, where an override variable was specified which was not a known variable within the job specification. In order to mitigate this change for users who wish to only be warned when this situation arises, the -hcl-strict=false flag can be specified.

Nomad 1.0.11 and 1.1.5 Enterprise

Audit log file names

Audit log file naming now matches the standard log file naming introduced in 1.0.10 and 1.1.4. The audit log currently being written will no longer have a timestamp appended.

Nomad 1.0.10 and 1.1.4

Log file names

The log_file configuration option was not being fully respected, as the generated filename would include a timestamp. After upgrade, the active log file will always be the value defined in log_file, with timestamped files being created during log rotation.

Nomad 1.0.9 and 1.1.3

Namespace in Job Run and Plan APIs

The Job Run and Plan APIs now respect the ?namespace=... query parameter over the namespace specified in the job itself. This matches the precedence of region and fixes a bug where the -namespace flag was not respected for the nomad run and nomad apply commands.

For users of api.Client who want their job namespace respected, you must ensure the Config.Namespace field is unset.

Docker Driver

1.1.3 only

Starting in Nomad 1.1.2, task groups with network.mode = "bridge" generated a hosts file in Docker containers. This generated hosts file was bind-mounted from the task directory to /etc/hosts within the task. In Nomad 1.1.3 the source for the bind mount was moved to the allocation directory so that it is shared between all tasks in an allocation.

Please note that this change may prevent extra_hosts values from being properly set in each task when there are multiple tasks within the same group. When using extra_hosts with Consul Connect in bridge network mode, you should set the hosts values in the sidecar_task.config block instead.

Nomad 1.1.0

Enterprise licenses

Nomad Enterprise licenses are no longer stored in raft or synced between servers. Nomad Enterprise servers will not start without a license. There is no longer a six hour evaluation period when running Nomad Enterprise. Before upgrading, you must provide each server with a license on disk or in its environment. Refer to the Enterprise licensing documentation for details.

The nomad license put command has been removed.

The nomad license get command is no longer forwarded to the Nomad leader, and will return the license from the specific server being contacted.

Visit the Enterprise licensing page to get a trial license for Nomad Enterprise.

Agent Metrics API

The Nomad agent metrics API now respects the prometheus_metrics configuration value. If this value is set to false, which is the default value, calling /v1/metrics?format=prometheus will now result in a response error.

CSI volumes

The volume specification for CSI volumes has been updated to support volume creation. The access_mode and attachment_mode fields have been moved to a capability block that can be repeated. Existing registered volumes will be automatically modified the next time that a volume claim is updated. Volume specification files for new volumes should be updated to the format described in the volume create and volume register commands.

The volume block has an access_mode and attachment_mode field that are required for CSI volumes. Jobs that use CSI volumes should be updated with these fields.

Connect native tasks

Connect native tasks running in host networking mode will now have CONSUL_HTTP_ADDR set automatically. Before this was only the case for bridge networking. If an operator already explicitly set CONSUL_HTTP_ADDR then it will not get overridden.

Linux capabilities in exec/java

Following the security remediation in Nomad versions 0.12.12, 1.0.5, and 1.1.0-rc1, the exec and java task drivers will additionally no longer enable the following linux capabilities by default.

AUDIT_CONTROL  AUDIT_READ  BLOCK_SUSPEND  DAC_READ_SEARCH  IPC_LOCK  IPC_OWNER  LEASE
LINUX_IMMUTABLE  MAC_ADMIN  MAC_OVERRIDE  NET_ADMIN  NET_BROADCAST  NET_RAW  SYS_ADMIN
SYS_BOOT  SYSLOG  SYS_MODULE  SYS_NICE  SYS_PACCT  SYS_PTRACE  SYS_RAWIO  SYS_RESOURCE
SYS_TIME  SYS_TTY_CONFIG  WAKE_ALARM

The capabilities now enabled by default are modeled after Docker default linux capabilities (excluding NET_RAW).

AUDIT_WRITE  CHOWN  DAC_OVERRIDE  FOWNER  FSETID  KILL  MKNOD  NET_BIND_SERVICE
SETFCAP  SETGID  SETPCAP  SETUID  SYS_CHROOT

A new allow_caps plugin configuration parameter for [exec][allow_caps_exec] and java task drivers can be used to restrict the set of capabilities allowed for use by tasks.

Tasks using the exec or java task drivers can add or remove desired linux capabilities using the cap_add and cap_drop task configuration options.

iptables

Nomad now appends its iptables rules to the NOMAD-ADMIN chain instead of inserting them as the first rule. This allows better control for user-defined iptables rules but users who append rules currently should verify that their rules are being appended in the correct order.

Nomad 1.1.0-rc1, 1.0.5, 0.12.12

Nomad versions 1.1.0-rc1, 1.0.5 and 0.12.12 change the behavior of the docker, exec, and java task drivers so that the CAP_NET_RAW Linux capability is disabled by default. This is one of the Linux capabilities that Docker itself enables by default, as this capability enables the generation of ICMP packets - used by the common ping utility for performing network diagnostics. When used by groups in bridge networking mode, the CAP_NET_RAW capability also exposes tasks to ARP spoofing, enabling DoS and MITM attacks against other tasks running in bridge networking on the same host. Operators should weigh potential impact of an upgrade on their applications against the security consequences inherit with CAP_NET_RAW. Typical applications using tcp or udp based networking should not be affected.

This is the sole change for Nomad 1.0.5 and 0.12.12, intended to provide better task network isolation by default.

Users of the docker driver can restore the previous behavior by configuring the allow_caps driver configuration option to explicitly enable the CAP_NET_RAW capability.

plugin "docker" {
  config {
    allow_caps = [
      "CHOWN", "DAC_OVERRIDE", "FSETID", "FOWNER", "MKNOD",
      "SETGID", "SETUID", "SETFCAP", "SETPCAP", "NET_BIND_SERVICE",
      "SYS_CHROOT", "KILL", "AUDIT_WRITE", "NET_RAW",
    ]
  }
}

An upcoming version of Nomad will include similar configuration options for the exec and java task drivers.

This change is limited to docker, exec, and java driver plugins. It does not affect the Nomad server. This only affects Nomad clients running Linux, with tasks using bridge networking and one of these task drivers, or third-party plugins which relied on the shared Nomad executor library.

Upgrading a Nomad client to 1.0.5 or 0.12.12 will not restart existing tasks. As such, processes from existing docker, exec, or java tasks will need to be manually restarted (using alloc stop or another mechanism) in order to be fully isolated.

Nomad 1.0.3, 0.12.10

Nomad versions 1.0.3 and 0.12.10 change the behavior of the exec and java drivers so that tasks are isolated in their own PID and IPC namespaces. As a result, the process launched by these drivers will be PID 1 in the namespace. This has significant impact on the treatment of a process by the Linux kernel. Furthermore, tasks in the same allocation will no longer be able to coordinate using signals, SystemV IPC objects, or POSIX message queues. Operators should weigh potential impact of an upgrade on their applications against the security consequences inherent in using the host namespaces.

This is the sole change for Nomad 1.0.3, intended to provide better process isolation by default. An upcoming version of Nomad will include options for configuring this behavior.

This change is limited to the exec and java driver plugins. It does not affect the Nomad server. This only affect Nomad clients running on Linux, using the exec or java drivers or third-party driver plugins which relied on the shared Nomad executor library.

Upgrading a Nomad client to 1.0.3 or 0.12.10 will not restart existing tasks. As such, processes from existing exec/java tasks will need to be manually restarted (using alloc stop or another mechanism) in order to be fully isolated.

Nomad 1.0.2

Dynamic secrets trigger template changes on client restart

Nomad 1.0.2 changed the behavior of template change_mode triggers when a client node restarts. In Nomad 1.0.1 and earlier, the first rendering of a template after a client restart would not trigger the change_mode. For dynamic secrets such as the Vault PKI secrets engine, this resulted in the secret being updated but not restarting or signalling the task. When the secret's lease expired at some later time, the task workload might fail because of the stale secret. For example, a web server's SSL certificate would be expired and browsers would be unable to connect.

In Nomad 1.0.2, when a client node is restarted any task with Vault secrets that are generated or have expired will have its change_mode triggered. If change_mode = "restart" this will result in the task being restarted, to avoid the task failing unexpectedly at some point in the future. This change only impacts tasks using dynamic Vault secrets engines such as PKI, or when secrets are rotated. Secrets that don't change in Vault will not trigger a change_mode on client restart.

Nomad 1.0.1

Envoy worker threads

Nomad v1.0.0 changed the default behavior around the number of worker threads created by the Envoy when being used as a sidecar for Consul Connect. In Nomad v1.0.1, the same default setting of --concurrency=1 is set for Envoy when used as a Connect gateway. As before, the [meta.connect.proxy_concurrency][proxy_concurrency] property can be set in client configuration to override the default value.

Nomad 1.0.0

HCL2 for Job specification

Nomad v1.0.0 adopts HCL2 for parsing the job spec. HCL2 extends HCL with more expression and reuse support, but adds some stricter schema for HCL blocks (a.k.a. blocks). Check HCL for more details.

Signal used when stopping Docker tasks

When stopping tasks running with the Docker task driver, Nomad documents that a SIGTERM will be issued (unless configured with kill_signal). However, recent versions of Nomad would issue SIGINT instead. Starting again with Nomad v1.0.0 SIGTERM will be sent by default when stopping Docker tasks.

Deprecated metrics have been removed

Nomad v0.7.0 added supported for tagged metrics and deprecated untagged metrics. There was support for configuring backwards-compatible metrics. This support has been removed with v1.0.0, and all metrics will be emitted with tags.

Null characters in region, datacenter, job name/ID, task group name, and task names

Starting with Nomad v1.0.0, jobs will fail validation if any of the following contain null character: the job ID or name, the task group name, or the task name. Any jobs meeting this requirement should be modified before an update to v1.0.0. Similarly, client and server config validation will prohibit either the region or the datacenter from containing null characters.

EC2 CPU characteristics may be different

Starting with Nomad v1.0.0, the AWS fingerprinter uses data derived from the official AWS EC2 API to determine default CPU performance characteristics, including core count and core speed. This data should be accurate for each instance type per region. Previously, Nomad used a hand-made lookup table that was not region aware and may have contained inaccurate or incomplete data. As part of this change, the AWS fingerprinter no longer sets the cpu.modelname attribute.

As before, cpu_total_compute can be used to override the discovered CPU resources available to the Nomad client.

Inclusive language

Starting with Nomad v1.0.0, the terms blacklist and whitelist have been deprecated from client configuration and driver configuration. The existing configuration values are permitted but will be removed in a future version of Nomad. The specific configuration values replaced are:

Client driver.blacklist is replaced with driver.denylist.
Client driver.whitelist is replaced with driver.allowlist.
Client env.blacklist is replaced with env.denylist.
Client fingerprint.blacklist is replaced with fingerprint.denylist.
Client fingerprint.whitelist is replaced with fingerprint.allowlist.
Client user.blacklist is replaced with user.denylist.
Client template.function_blacklist is replaced with template.function_denylist.
Docker driver docker.caps.whitelist is replaced with docker.caps.allowlist.

Consul Connect

Nomad 1.0's Consul Connect integration works best with Consul 1.9 or later. The ideal upgrade path is:

Create a new Nomad client image with Nomad 1.0 and Consul 1.9 or later.
Add new hosts based on the image.
Drain and shutdown old Nomad client nodes.

While inplace upgrades and older versions of Consul are supported by Nomad 1.0, Envoy proxies will drop and stop accepting connections while the Nomad agent is restarting. Nomad 1.0 with Consul 1.9 do not have this limitation.

Envoy proxy versions

Nomad v1.0.0 changes the behavior around the selection of Envoy version used for Connect sidecar proxies. Previously, Nomad always defaulted to Envoy v1.11.2 if neither the meta.connect.sidecar_image parameter or sidecar_task block were explicitly configured. Likewise the same version of Envoy would be used for Connect ingress gateways if meta.connect.gateway_image was unset. Starting with Nomad v1.0.0, each Nomad Client will query Consul for a list of supported Envoy versions. Nomad will make use of the latest version of Envoy supported by the Consul agent when launching Envoy as a Connect sidecar proxy. If the version of the Consul agent is older than v1.7.8, v1.8.4, or v1.9.0, Nomad will fallback to the v1.11.2 version of Envoy. As before, if the meta.connect.sidecar_image, meta.connect.gateway_image, or sidecar_task block are set, those settings take precedence.

When upgrading Nomad Clients from a previous version to v1.0.0 and above, it is recommended to also upgrade the Consul agents to v1.7.8, 1.8.4, or v1.9.0 or newer. Upgrading Nomad and Consul to versions that support the new behavior while also doing a full node drain at the time of the upgrade for each node will ensure Connect workloads are properly rescheduled onto nodes in such a way that the Nomad Clients, Consul agents, and Envoy sidecar tasks maintain compatibility with one another.

Envoy worker threads

Nomad v1.0.0 changes the default behavior around the number of worker threads created by the Envoy sidecar proxy when using Consul Connect. Previously, the Envoy --concurrency argument was left unset, which caused Envoy to spawn as many worker threads as logical cores available on the CPU. The --concurrency value now defaults to 1 and can be configured by setting the meta.connect.proxy_concurrency property in client configuration.

Nomad 0.12.8

Docker volume mounts

Nomad 0.12.8 includes security fixes for the handling of Docker volume mounts:

The docker.volumes.enabled flag now defaults to false as documented.
Docker driver mounts of type "volume" (but not "bind") were not sandboxed and could mount arbitrary locations from the client host. The docker.volumes.enabled configuration will now disable Docker mounts with type "volume" when set to false (the default).

This change Docker impacts jobs that use a mounts with type "volume", as shown below. This job will fail when placed unless docker.volumes.enabled = true.

mounts = [
  {
    type     = "volume"
    target   = "/path/in/container"
    source   = "docker_volume"
    volume_options = {
      driver_config = {
        name = "local"
        options = [
          {
            device = "/"
            o      = "ro,bind"
            type   = "ext4"
          }
        ]
      }
    }
  }
]

Nomad 0.12.6

Artifact and Template Paths

Nomad 0.12.6 includes security fixes for privilege escalation vulnerabilities in handling of job template and artifact blocks:

The template.source and template.destination fields are now protected by the file sandbox introduced in 0.9.6. These paths are now restricted to fall inside the task directory by default. An operator can opt-out of this protection with the template.disable_file_sandbox field in the client configuration.
The paths for template.source, template.destination, and artifact.destination are validated on job submission to ensure the paths do not escape the file sandbox. It was possible to use interpolation to bypass this validation. The client now interpolates the paths before checking if they are in the file sandbox.

Warning: Due to a bug in Nomad v0.12.6, the template.destination and artifact.destination paths do not support absolute paths, including the interpolated NOMAD_SECRETS_DIR, NOMAD_TASK_DIR, and NOMAD_ALLOC_DIR variables. This bug is fixed in v0.12.9. To work around the bug, use a relative path.

Nomad 0.12.0

`mbits` and Task Network Resource deprecation

Starting in Nomad 0.12.0 the mbits field of the network resource block has been deprecated and is no longer considered when making scheduling decisions. This is in part because we felt that mbits didn't accurately account network bandwidth as a resource.

Additionally the use of the network block inside of a task's resource block is also deprecated. Users are advised to move their network block to the group block. Recent networking features have only been added to group based network configuration. If any usecase or feature which was available with task network resource is not fulfilled with group network configuration, please open an issue detailing the missing capability.

Additionally, the docker driver's port_map configuration is deprecated in lieu of the ports field.

Enterprise Licensing

Enterprise binaries for Nomad are now publicly available via releases.hashicorp.com. By default all enterprise features are enabled for 6 hours. During that time enterprise users should apply their license with the nomad license put ... command.

Once the 6 hour demonstration period expires, Nomad will shutdown. If restarted Nomad will shutdown in a very short amount of time unless a valid license is applied.

Warning: Due to a bug in Nomad v0.12.0, existing clusters that are upgraded will not have 6 hours to apply a license. The minimal grace period should be sufficient to apply a valid license, but enterprise users are encouraged to delay upgrading until Nomad v0.12.1 is released and fixes the issue.

Docker access host filesystem

Nomad 0.12.0 disables Docker tasks access to the host filesystem, by default. Prior to Nomad 0.12, Docker tasks may mount and then manipulate any host file and may pose a security risk.

Operators now must explicitly allow tasks to access host filesystem. Host Volumes provide a fine tune access to individual paths.

To restore pre-0.12.0 behavior, you can enable Docker volume to allow binding host paths, by adding the following to the nomad client config file:

plugin "docker" {
  config {
    volumes {
      enabled = true
    }
  }
}

QEMU images

Nomad 0.12.0 restricts the paths the QEMU tasks can load an image from. A QEMU task may download an image to the allocation directory to load. But images outside the allocation directories must be explicitly allowed by operators in the client agent configuration file.

For example, you may allow loading QEMU images from /mnt/qemu-images by adding the following to the agent configuration file:

plugin "qemu" {
  config {
    image_paths = ["/mnt/qemu-images"]
  }
}

Nomad 0.11.7

Docker volume mounts

Nomad 0.11.7 includes a security fix for the handling of Docker volume mounts. Docker driver mounts of type "volume" (but not "bind") were not sandboxed and could mount arbitrary locations from the client host. The docker.volumes.enabled configuration will now disable Docker mounts with type "volume" when set to false.

This change Docker impacts jobs that use a mounts with type "volume", as shown below. This job will fail when placed unless docker.volumes.enabled = true.

mounts = [
  {
    type     = "volume"
    target   = "/path/in/container"
    source   = "docker_volume"
    volume_options = {
      driver_config = {
        name = "local"
        options = [
          {
            device = "/"
            o      = "ro,bind"
            type   = "ext4"
          }
        ]
      }
    }
  }
]

Nomad 0.11.5

Artifact and Template Paths

Nomad 0.11.5 includes backported security fixes for privilege escalation vulnerabilities in handling of job template and artifact blocks:

The template.source and template.destination fields are now protected by the file sandbox introduced in 0.9.6. These paths are now restricted to fall inside the task directory by default. An operator can opt-out of this protection with the template.disable_file_sandbox field in the client configuration.
The paths for template.source, template.destination, and artifact.destination are validated on job submission to ensure the paths do not escape the file sandbox. It was possible to use interpolation to bypass this validation. The client now interpolates the paths before checking if they are in the file sandbox.

Warning: Due to a bug in Nomad v0.11.5, the template.destination and artifact.destination paths do not support absolute paths, including the interpolated NOMAD_SECRETS_DIR, NOMAD_TASK_DIR, and NOMAD_ALLOC_DIR variables. This bug is fixed in v0.11.6. To work around the bug, use a relative path.

Nomad 0.11.3

Nomad 0.11.3 fixes a critical bug causing the nomad agent to become unresponsive. The issue is due to a Go 1.14.1 runtime bug and affects Nomad 0.11.1 and 0.11.2.

Nomad 0.11.2

Scheduler Scoring Changes

Prior to Nomad 0.11.2 the scheduler algorithm used a node's reserved resources incorrectly during scoring. The result of this bug was that scoring biased in favor of nodes with reserved resources vs nodes without reserved resources.

Placements will be more correct but slightly different in v0.11.2 vs earlier versions of Nomad. Operators do not need to take any actions as the impact of the bug fix will only minimally affect scoring.

Feasibility (whether a node is capable of running a job at all) is not affected.

Periodic Jobs and Daylight Saving Time

Nomad 0.11.2 fixed a long outstanding bug affecting periodic jobs that are scheduled to run during Daylight Saving Time transitions.

Nomad 0.11.2 provides a more defined behavior: Nomad evaluates the cron expression with respect to specified time zone during transition. A 2:30am nightly job with America/New_York time zone will not run on the day daylight saving time starts; similarly, a 1:30am nightly job will run twice on the day daylight saving time ends. See the Daylight Saving Time documentation for details.

Nomad 0.11.0

client.template: `vault_grace` deprecation

Nomad 0.11.0 updates consul-template to v0.24.1. This library deprecates the vault_grace option for templating included in Nomad. The feature has been ignored since Vault 0.5 and as long as you are running a more recent version of Vault, you can safely remove vault_grace from your Nomad jobs.

Rkt Task Driver Removed

The rkt task driver has been deprecated and removed from Nomad. While the code is available in the external nomad-driver-rkt repository, it will not be maintained as rkt is no longer being developed upstream. We encourage all rkt users to find a new task driver as soon as possible.

Nomad 0.10 and earlier

Refer to the Nomad releases page in the Nomad repo for prior release changelogs.

Nomad version-specific upgrade guides

Nomad 1.11.2

QEMU driver

Nomad 1.11.1

Storage fingerprinting calculation changed

QEMU driver

Nomad 1.11.0

Sysbatch jobs will no longer accept reschedule blocks

Eval broker metrics for dispatch and periodic jobs

ACL policies no longer silently ignore duplicate or invalid keys

Maximum number of allocations per job is limited by default

Deprecated resource fields on Node API

Enterprise product usage reporting Enterprise

Nomad 1.10.6

ACL policies no longer silently ignore duplicate or invalid keys

Enterprise product usage reporting Enterprise

Nomad 1.10.2

Clients respect telemetry.publish_allocation_metrics

Nomad 1.10.1

Remove Raft peer by address removed

Agent exit on reloading configuration errors

Added Server start_timeout Configuration Option

Corrected /v1/acl/token/self response codes

Nomad 1.10.0

Quota specification variable_limits deprecated Enterprise

Nomad 1.8 deprecated disconnect fields removed

Go SDK API change for quota limits

Remote task driver support removed

Loading binaries from plugin_dir without configuration

Sentinel apply command requires scope Enterprise

Affinity and spread updates are non-destructive

Vault and Consul integration changes

Nomad 1.9.9

Added Server start_timeout Configuration Option

Nomad 1.9.5

CNI plugins

Nomad 1.9.4

Security updates to default deny lists

Nomad 1.9.3

Nomad 1.9.0

Dropped support for older clients

Keyring In Raft

Support for HCLv1 removed

Nomad 1.8.18

ACL policies no longer silently ignore duplicate or invalid keys

Enterprise product usage reporting Enterprise

Nomad 1.8.4

Default Docker infra_image changed

Nomad 1.8.3

Nomad keyring rotation

Nomad 1.8.2

New windows_allow_insecure_container_admin configuration option for Docker driver

New default isolation mode for Docker on Windows

Nomad 1.8.1

Nomad 1.8.0

Deprecated Disconnect Fields

CNI Constraints

Removal of raw_exec option no_cgroups

Nomad 1.7.11

Nomad keyring rotation

Nomad 1.7.10

New windows_allow_insecure_container_admin configuration option for Docker driver

New default isolation mode for Docker on Windows

Nomad 1.7.2

Nomad 1.7.0

Keyring Replication Failure After Leader Election

Vault Integration Changes

Consul Integration Changes

RS256 JWT Signing Algorithm Support

CPU Fingerprinting Changes

CPU EC2 Detection Changes

CPU Core Isolation

The distinct_hosts Constraint Now Honors Namespaces

Loading Binaries from plugin_dir Without Configuration

Changes to raw_exec

Nomad 1.6.14

Nomad keyring rotation

Nomad 1.6.13

New windows_allow_insecure_container_admin configuration option for Docker driver

New default isolation mode for Docker on Windows

Sysbatch jobs will no longer accept `reschedule` blocks

Enterprise product usage reporting
Enterprise

Enterprise product usage reporting
Enterprise

Clients respect `telemetry.publish_allocation_metrics`

Added Server `start_timeout` Configuration Option

Corrected `/v1/acl/token/self` response codes

Quota specification variable_limits deprecated
Enterprise

Nomad 1.8 deprecated `disconnect` fields removed

Loading binaries from `plugin_dir` without configuration

Sentinel apply command requires scope
Enterprise

Added Server `start_timeout` Configuration Option

Enterprise product usage reporting
Enterprise

Default Docker `infra_image` changed

New `windows_allow_insecure_container_admin` configuration option for Docker driver

Removal of `raw_exec` option `no_cgroups`

New `windows_allow_insecure_container_admin` configuration option for Docker driver

The `distinct_hosts` Constraint Now Honors Namespaces

Loading Binaries from `plugin_dir` Without Configuration

Changes to `raw_exec`

New `windows_allow_insecure_container_admin` configuration option for Docker driver

Job Evaluate API Endpoint Requires `submit-job` Instead of `read-job`

Command `nomad tls cert create` flag `-cluster-region` deprecated

Server `rejoin_after_leave` (default: `false`) now enforced

`nomad eval status -json` Without Evaluation ID Removed

Client `max_kill_timeout` now enforced

Default `artifact` limits

Changes to the `nomad server members` command

Changes to `client.template.function_denylist` configuration

`nomad eval status -json` deprecated