Secure Nomad Jobs with Consul Service Mesh
Nomad's first-class integration with Consul allows operators to design jobs that
natively leverage Consul service mesh. However in Consul Clusters that are ACL-enabled
there are a few extra steps required to verify that your Nomad servers and clients
have Consul ACL tokens with sufficient privileges to create additional services
for the required sidecar proxies. This tutorial explores those steps, has you run a
sample Connect workload, learn about the allow_unauthenticated
value; so that
you will be able to configure your own Nomad cluster to run Connect jobs against
your own ACL-Enabled consul cluster.
Prerequisites
Nomad v0.10.4 or greater
a Nomad environment with Nomad and Consul installed. You can use this Terraform environment to provision a sandbox environment. This guide will assume a cluster with one node running both Consul and Nomad in server mode. And one or more nodes running Nomad and Consul in client mode.
You will need
- to have the Consul cluster you are connecting to ACL enabled and bootstrapped
- a management key.
You can use the "Secure Consul with ACLs" tutorial to configure a Consul cluster for this guide.
If your Consul cluster is TLS-enabled for agent communication and you are using Nomad version 0.10, you will need to provide some Consul configuration as environment variables for your Nomad process. This can be done by modifying your init scripts or systemd system units. This will be discussed later in the guide.
Note
This tutorial is for demo purposes and is only using a single Nomad server with a Consul server configured alongside it. In a production cluster, 3 or 5 Nomad server nodes are recommended along with a separate Consul cluster. Consult the Consul Reference Architecture to learn how to securely deploy a Vault cluster.
Generate Consul ACL tokens for Nomad
Create a Nomad server policy
Define the Nomad server policy by making a file named nomad-server-policy.hcl
with this content.
Create the Nomad server policy by uploading this file.
The command outputs information about the newly created policy and its rules.
Create a Nomad client policy
Define the Nomad client policy by making a file named nomad-client-policy.hcl
with this content.
Create the Nomad client policy by uploading this file.
The command outputs information about the newly created policy and its rules.
Create a token for Nomad
Generate a token associated with these policies and save it to a file named nomad-agent.token. Because this tutorial is written for a node that is both a client and a server, apply both policies to the token. Typically, you would generate tokens with the nomad-server role for your Nomad server nodes and tokens with the nomad-client role for your Nomad client nodes.
Consider applying roles instead of rotating tokens
If your Nomad node already has a token, it is better to add the required capabilities to the existing token or roles rather than changing to a new token.
The command will return a new Consul token for use in your Nomad configuration.
Update Nomad's Consul configuration
Open the your Nomad configuration file on all of your nodes and add a consul
stanza with your token.
Provide environment variables for TLS-enabled Consul
If you are using Nomad version 0.10 and your Consul cluster is TLS-enabled, you
will need to provide additional Consul configurations as environment
variables to the Nomad process. This is to work around a known issue in
Nomad—hashicorp/nomad#6594
. Refer to the TLS-enabled Consul
environment section in the "Advanced
considerations" of this tutorial for details. You will be able to return
to here after you read that material.
Alternative architectures (non-x86/amd64)
If you are running on ARM or another non-x86/amd64 architecture, jump to the Alternative Architectures section in the "Advanced Considerations" appendix of this tutorial for details. You will be able to return to here after you read that material.
Restart Nomad to load new configuration
Run systemctl restart nomad
to restart Nomad to load these changes.
Run a Connect-enabled job
Create the job specification
Create the "countdash" job by copying this job specification into a file named
countdash.nomad.hcl
.
Create an intention
In Consul, the default intention behavior is defined by the default ACL policy. If the default ACL policy is "allow all", then all service mesh connections are allowed by default. If the default ACL policy is "deny all", then all service mesh connections are denied by default.
To avoid unexpected behavior around this, it is better to create an explicit intention. Create an intention to allow traffic from the count-dashboard service to the count-api service.
First, create a file for a config entry definition named intention-config.hcl
.
From the same directory where you saved this file, run the following command on your terminal to initialize these intention rules.
Note To learn more about intentions, feel free to check out the Service Intentions Docs.
Run the job
Run the job by calling nomad run countdash.nomad.hcl
.
The command will output the result of running the job and show the allocation IDs of the two new allocations that are created.
Once you are done, run nomad stop countdash
to prepare for the next
step.
The command will output evaluation information about the stop request and stop the allocations in the background.
Use Consul authentication on jobs
By default, Nomad does not require that an operator validate themselves and will create ACL permissions at any level the Nomad server token can. In some scenarios, this can allow an operator to escalate their privileges to that of Nomad server.
To prevent this, you can set the allow_unauthenticated
option to false.
Update Nomad configuration
Open the your Nomad configuration file on all of your nodes and add the
allow_unauthenticated
value inside of the consul
configuration block.
Run the systemctl restart nomad
command to restart Nomad to load
these changes.
Submit the job with a Consul token
Start by unsetting the Consul token in your shell session.
Now, try running countdash.nomad.hcl again. You will receive an error explaining that you need to supply a Consul token.
You will receive an error that indicates that you need to supply an Consul ACL token in order to run the job.
Nomad will not allow you to submit a job to the cluster without providing a Consul token that has write access to the Consul service that the job defines.
You can supply the token in a few ways:
CONSUL_HTTP_TOKEN
environment variable-consul-token
flag on the command line-X-Consul-Token
header on API calls
Reload your management token into the CONSUL_HTTP_TOKEN
environment variable.
Now, try running countdash.nomad.hcl again. This time it will succeed.
Advanced considerations
Alternative architectures
Nomad provides a default link to a pause image. This image, however, is architecture specific and is only provided for the amd64 architecture. In order to use Consul service mesh on non-x86/amd64 hardware, you will need to configure Nomad to use a different pause container. If Nomad is trying to use a version of Envoy earlier than 1.16, you will need to specify a different version it as well. Read through the section on airgapped networks below. It explains the same configuration elements that you will need to set to use alternative containers for service mesh.
Special thanks to @GusPS, who reported this working configuration.
Envoy 1.16 now has ARM64 support. Configure it as your sidecar image by setting this the connect.sidecar_image meta variable on each of your ARM64 clients.
The rancher/pause
container has versions for several different architectures
as well. Override the default pause container and use it instead. In your client
configuration, add an infra_image
to your docker plugin configuration
overriding the default with the rancher version.
If you came here from "Alternative Architectures" note above, return there now.
Airgapped networks or proxied environments
If you are in an airgapped network or need to access Docker Hub via a proxy, you will have to perform some additional configuration on your Nomad clients to enable Nomad's Consul service mesh integration.
Set the "infra_image" path
Set the infra_image
configuration option for the Docker driver plugin on
your Nomad clients to a path that is accessible in your environment. For
example,
Changing this value will require a restart of Nomad.
Set the "sidecar_image" path
You will also need the Envoy proxy image used for Consul service mesh
networking. Configure the "meta.connect.sidecar_image" on your Nomad clients to
override the default container path by adding a "connect.sidecar_image"
value
to the client.meta stanza of your Nomad client configuration. If you do not have
a meta stanza inside of your top-level client stanza, add one as follows.
Changing this value will require a restart of Nomad.
Next steps
Now that you have completed this guide, you have:
- configured your cluster with a Consul token
- run a non-validated job using the Nomad server's Consul token
- run a validated job using the permissions of a user-provided Consul token
Now that you have run both a non-validated and as a user-token validated job, which is right for your environment? All of these steps can be done using the Nomad API directly, which path might you use for your use case?
Reference material
Learn more about Consul service mesh in these Learn guides:
- Secure Service-to-Service Communication
- [Consul Service Mesh in Production]
Learn more about Consul ACLs:
Study Nomad's Consul configuration: