Cloud Deployment (AWS, Azure, GCP virtual machines)
This section provides a detailed guide for manually installing a Nomad cluster on AWS, Azure, and GCP instances. This approach is suitable for organizations that require fine-grained control over their Nomad deployment or have specific compliance requirements that necessitate manual installation. Before proceeding, ensure you have read and have a solid understanding of Nomad Architecture(opens in new tab) and Deployment requirements(opens in new tab).
Architectural Summary
As referenced in the Nomad architecture section(opens in new tab) each Nomad node in a cluster resides on a separate VM, with shared or dedicated tenancy both being acceptable. A single Nomad cluster is deployed within a single region, with nodes distributed across all available Availability Zones (AZs). This design leverages Nomad’s redundancy zone feature and deploys a minimum of 3 voting server nodes and 3 non-voting server nodes across 3 AZs.
Server Naming and IP Addressing
For illustrative purposes, we’ll use the following naming convention and IP addressing scheme for our Nomad servers:
DNS NAME | IP ADDRESS | NODE TYPE | LOCATION |
---|---|---|---|
nomad-1.domain | 10.0.0.1 | Voting server | us-east-1a |
nomad-2.domain | 10.0.0.2 | Voting server | us-east-1b |
nomad-3.domain | 10.0.0.3 | Voting server | us-east-1c |
nomad-4.domain | 10.0.0.4 | Non-voting server | us-east-1a |
nomad-5.domain | 10.0.0.5 | Non-voting server | us-east-1b |
nomad-6.domain | 10.0.0.6 | Non-voting server | us-east-1c |
nomad.domain | 10.0.0.100 | NLB | (all zones) |
Note: These names and IP addresses are examples only. Adjust them according to your specific environment and naming conventions.
Certificates
Create a standard X.509 certificate that will be installed on the Nomad servers. Follow your organization’s process for creating a new certificate that matches the DNS record you intend to use for accessing Nomad. This guide will be assuming the conventions and using self signed certificates from the Nomad Architecture(opens in new tab) and Enable TLS tutorial(opens in new tab).
You will need a total of three files:
- CA Public certificate (
nomad-agent-ca.pem
) - Server node certificate’s private key: (
global-server-nomad-key.pem
) - Server node public certificate: (
global-server-nomad.pem
)
These certificates will be distributed to the server nodes in a later section and used to secure communications between Nomad clients, servers, and the API.
Additionally, a fourth certificate and key was generated: global-cli-nomad.pem
& global-cli-nomad-key.pem
. These will be used later on this guide to interact with the cluster. The global-
prefix for the certificate and key assume these are for multi-region environments. You are able to generate certificates on a per-region basis, however In this guide, we are using a single certificate and key that is assumed to be used across all clusters globally.
Note
Ensure the certificate’s Common Name (CN) or Subject Alternative Names (SANs) include the DNS names you’ll use to access Nomad, such as `nomad.yourdomain.com`. If you’re using separate DNS names for server-to-server communication, include these in the SANs as well.While Nomad's TLS configuration will be production ready, key management and rotation is a complex subject not covered by this guide. Vault(opens in new tab) is the suggested solution for key generation and management.
Firewall Rules
Create firewall rules that allow these ports bidirectionally:
Port | Name | Purpose | Protocol |
---|---|---|---|
4646 | HTTP API | Used by clients and servers to serve the HTTP API | TCP only |
4647 | RPC | Used for internal RPC communication between client agents and servers, and for inter-server traffic | TCP only |
4648 | Serf WAN | Used by servers to gossip both over the LAN and WAN to other servers. Not required for Nomad clients to reach this address | TCP and UDP |
Load Balancing the API and UI
- Ensure your load balancer is in front of all Nomad server nodes, across all availability zones, and all subnets where the Nomad server nodes are deployed.
- Utilize TCP ports 4646 (HTTP API) and 4647 (RPC). The target groups associated with each of these listeners will contain all Nomad server nodes.
- Use the
/v1/agent/health
endpoint for health checking.
Download and Install the Nomad CLI
To interact with your Nomad cluster, you’ll need to install the Nomad binary. Follow the steps for your OS on the Install Nomad(opens in new tab) documentation page.
Platform-specific guidance
Select the tab below for your cloud provider for further guidance and corresponding machine size choice.
AWS
Compute
It is recommended using m5.xlarge
or m5.2xlarge
instance types to provide the necessary vCPU and RAM resources for Nomad. Instance types from other instance families can also be used, provided that they meet or exceed the above resource recommendations.
EC2 Volume Storage
Nomad can be very IOPS intensive as it writes chunks of data from memory to storage. Initially, EBS volumes with Provisioned IOPS SSD (io1 or io2) are recommended as they meet the disk performance needs for most use cases. These provide a balance of performance and manageability suitable for typical Nomad deployments.
As your cluster scales or experiences large batch job scheduling events, you may encounter storage performance limitations. In such scenarios, consider starting or transitioning to instances with local NVMe storage. While this can offer significant performance benefits, be aware that it requires additional OS-level configuration during instance setup. This trade-off between enhanced performance and increased setup complexity should be carefully evaluated based on your specific workload demands and operational capabilities. Regularly monitor your IOPS metrics using your preferred monitoring solution to identify when you're approaching storage performance limits. For detailed guidance on monitoring Nomad server storage metrics, refer to the Observability section of the Nomad Operating guide(opens in new tab).
Snapshot Storage
When deploying in AWS, use an S3 bucket for storage of Automated Snapshots(opens in new tab). The S3 bucket should be deployed in a different region from the compute instances, to allow the possibility of restoring the cluster in case of regional failure. S3 Versioning and MFA Delete can be used to ensure that the snapshots cannot be deleted by mistake or a malicious actor.
Create IAM role for Cloud Auto-join
It is recommended that you make a IAM role used only for auto-joining. The only required IAM permission is ec2:DescribeInstances
, and if the region is omitted it will be discovered through the local instance's EC2 metadata endpoint(opens in new tab). Visit Auto join AWS Provider(opens in new tab) for more information.
Obtain the Nomad Enterprise License File
Obtain the Nomad Enterprise License File from your HashiCorp account team. This file contains a license key unique to your environment. The file will be named something like license.hclic
.
Keep this file handy, as you will need it later in the installation process.
Deployment Process
Create and Connect to your EC2 instances
- Launch 2 instances in each of the 3 AZs, for a total of 6 instances using the AMI of your choice. Ubuntu 24.04 is the OS used in the following example, some commands will have to be adjusted if you're using a different OS (most notably
apt
has to be replaced with your distro's package manager). - Ensure the IAM Instance Profile is set to the Cloud Auto-join role you previously created.
- Ensure the instances are tagged with
tag_key=cloud-auto-join
&tag_value=true
- Update the OS
ssh -i your-key.pem ubuntu@your-vm-ip
sudo apt update && sudo apt upgrade -y
Download and install Nomad
NOMAD_VERSION="1.9.1+ent"
wget https://releases.hashicorp.com/nomad/${NOMAD_VERSION}/nomad_${NOMAD_VERSION}_linux_amd64.zip
unzip nomad_${NOMAD_VERSION}_linux_amd64.zip
sudo mv nomad /usr/local/bin/
Follow the instructions in the documentation(opens in new tab) page to install Nomad from the official HashiCorp package repository for your OS.
Create necessary directories
sudo mkdir -p /opt/nomad/data
sudo mkdir -p /etc/nomad.d
sudo mkdir -p /etc/nomad.d/tls
Copy TLS certificates
Copy your TLS certificates to /etc/nomad.d/tls/
:
sudo scp nomad-agent-ca.pem global-server-nomad-key.pem global-server-nomad.pem ubuntu@your-ec2-instance-ip:/etc/nomad.d/tls/
Copy License File
echo "02MV4UU43BK5..." >> /etc/nomad.d/license.hclic
Create Gossip Encryption Key
nomad operator gossip keyring generate
Save the output of this key to use in the configuration file.
Note
The `nomad operator gossip keyring generate` command returns 16 bytes; however, Nomad supports gossip encryption keys of 32 bytes as well. Supplying your own 32 byte key enables AES-256 mode, where supplying a 16 byte key enables AES-128 mode.Create Nomad configuration file
Create /etc/nomad.d/nomad.hcl
with the following content or supply your own. The example below includes:
- Cloud Auto-join(opens in new tab) which will join servers into the cluster based on AWS tag instead of hard coding IP's or hostnames.
- Autopilot(opens in new tab) which handles automated upgrades
- Redundancy zones(opens in new tab): Nomad will then use these values to partition the servers by redundancy zone, and will aim to keep one voting server per zone. Extra servers in each zone will stay as non-voters on standby to be promoted if the active voter leaves or dies. It is recommended to match the name of the AWS availability zone for simplicity. Ensure to change your config based on which AZ the EC2 instance resides in.
- TLS:(opens in new tab) enables TLS between nodes and for the API. At scale, it's recommended to use HashiCorp Vault(opens in new tab), however for starting out, it is acceptable to leverage signed certificates copied to the instance.
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0" #CHANGE ME to listen only on the required IPs
server {
enabled = true
bootstrap_expect = 3
server_join {
retry_join = ["provider=aws tag_key=cloud-auto-join tag_value=true"]
}
redundancy_zone = "us-east-1b" #CHANGE ME
license_path = "/etc/nomad.d/license.hclic"
encrypt = "YOUR-GOSSIP-KEY" #Paste your key from the above "Create Gossip Encryption Key" step
}
acl {
enabled = true
}
tls {
http = true
rpc = true
ca_file = "/etc/nomad.d/tls/nomad-agent-ca.pem"
cert_file = "/etc/nomad.d/tls/global-server-nomad-key.pem"
key_file = "/etc/nomad.d/tls/global-server-nomad.pem"
}
autopilot {
cleanup_dead_servers = true
last_contact_threshold = "200ms"
max_trailing_logs = 250
server_stabilization_time = "10s"
enable_redundancy_zones = true
disable_upgrade_migration = false
enable_custom_upgrades = false
}
Starting Nomad
Set up Nomad service
sudo nano /etc/systemd/system/nomad.service
[Unit]
Description=Nomad
Documentation=https://www.nomadproject.io/docs/
Wants=network-online.target
After=network-online.target
[Service]
# Nomad servers should be run as the nomad user.
# Nomad clients should be run as root
User=root
Group=root
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/bin/nomad agent -config /etc/nomad.d
KillMode=process
KillSignal=SIGINT
LimitNOFILE=65536
LimitNPROC=infinity
Restart=on-failure
RestartSec=2
TasksMax=infinity
OOMScoreAdjust=-1000
[Install]
WantedBy=multi-user.target
Start the Nomad service
sudo systemctl enable nomad
sudo systemctl start nomad
Validate the cluster is bootstrapped and functioning
Ensure you have 3 of the members state Voter=False
:
If you are running into issues with your cluster bootstrapping or the Nomad agent starting, you can view journal logs to reveal any errors:
sudo systemctl status nomad
sudo journalctl -xe
Typically issues are due to invalid configuration entries within nomad.hcl
, certificate, or networking issues where the nodes cannot communicate with each other.
Initialize ACL system
Log into one of your servers and run the following commands:
Set environment variables
export NOMAD_ADDR=https://127.0.0.1:4646
export NOMAD_CACERT=/etc/nomad.d/tls/nomad-agent-ca.pem
export NOMAD_CLIENT_CERT=/etc/nomad.d/tls/global-server-nomad.pem
export NOMAD_CLIENT_KEY=/etc/nomad.d/tls/global-server-nomad-key.pem
Bootstrap the ACL system. This only needs to be done on one server. The ACL's will sync across server nodes.
nomad acl bootstrap
Save the Accessor ID and Secret ID and follow the Identity and Access Management(opens in new tab) section of the Nomad Operating Guide for more information on ACL's and post installation tasks.
Interact with your Nomad Cluster
export NOMAD_ADDR=https://nomad-load-balancer
export NOMAD_TOKEN=YourSecretID
export NOMAD_CACERT=./nomad-agent-ca.pem
export NOMAD_CLIENT_CERT=./global-cli-nomad.pem
export NOMAD_CLIENT_KEY=./global-cli-nomad-key.pem
nomad -autocomplete-install && complete -C /usr/bin/nomad nomad
Nomad Clients
This section provides guidance on deploying and managing Nomad clients on AWS EC2 instances. Nomad clients are responsible for running tasks and jobs scheduled by the Nomad servers. They register with the server cluster, receive work assignments, and execute tasks.
Preparation
Many of the preparation steps for Nomad clients are similar to those for servers. Refer to the Preparation(opens in new tab) section for details on the following:
- Creating security groups
- Setting up IAM roles
- Preparing for launching EC2 instances
EC2 Instance Types
When selecting instance types for Nomad clients, consider the resource requirements of your workloads. AWS offers a variety of instance families optimized for different use cases:
Instance Family | Use Case | Workload Examples | Example Instances |
---|---|---|---|
General Purpose (M family) | Balanced compute, memory, and networking resources | Web servers, small-to-medium databases, dev/test environments | m5.xlarge, m6g.2xlarge |
Compute Optimized (C family) | High-performance computing, batch processing, scientific modeling | CPU-intensive applications, video encoding, high-performance web servers | c5.2xlarge, c6g.4xlarge |
Memory Optimized (R family) | High-performance databases, distributed memory caches, real-time big data analytics | SAP HANA, Apache Spark, Presto | r5.2xlarge, r6g.4xlarge |
Storage Optimized (I, D families) | High I/O applications, large databases, data warehousing | NoSQL databases, data processing applications | i3.2xlarge, d2.4xlarge |
GPU Instances (P, G families) | AI, high-performance computing, rendering | AI training and inference, scientific simulations | p3.2xlarge, g4dn.xlarge |
NUMA-aware Instances (X1, X1e families) | High-performance databases, in-memory databases, big data processing engines | SAP HANA, Apache Spark, Presto | x1.16xlarge, x1e.32xlarge |
These instance families can be organized within Nomad using node pools(opens in new tab), allowing for workload segregation and enabling job authors to target their workload to the most appropriate hardware resources.
Deployment process
Follow the steps in the Preparation(opens in new tab) and Installation Process(opens in new tab) sections of the server deployment section, with a few key differences in configuration. Additionally, ensure your clients are being spread across availability zones.
Client TLS Configuration
Nomad clients should use a different certificate than the server nodes. From the same directory as the CA certificate you generated during the server installation steps:
nomad tls cert create -client
This will generate:
global-client-nomad.pem
(Client certificate)global-client-nomad-key.pem
(Client private key)
Copy certificates to the client instances:
scp nomad-agent-ca.pem global-client-nomad.pem global-client-nomad-key.pem ubuntu@your-ec2-instance-ip:/etc/nomad.d/tls/
Remember to keep your private keys secure and implement proper certificate management practices, including regular rotation and secure distribution of certificates to your Nomad clients.
In your nomad.hcl
file:
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0" #CHANGE ME to listen only on the required IPs
client {
enabled = true
server_join {
retry_join = ["provider=aws tag_key=cloud-auto-join tag_value=true"]
}
}
tls {
http = true
rpc = true
ca_file = "/etc/nomad.d/tls/nomad-agent-ca.pem"
cert_file = "/etc/nomad.d/tls/global-client-nomad.pem"
key_file = "/etc/nomad.d/tls/global-client-nomad-key.pem"
}
#...other client configurations...
Consideration for auto scaling groups
While you can use AWS Auto Scaling groups to manually manage the number of Nomad clients, it's often more efficient to leverage the Nomad Autoscaler. The Nomad Autoscaler integrates directly with Nomad and provides more granular control over scaling decisions.
The Nomad Autoscaler can dynamically adjust the number of client instances based on various metrics, including job queue depth, CPU utilization, memory usage, and many more. It can even scale different node pools independently, allowing you to maintain the right mix of instance types for your workloads.
As referenced in the Nomad Architecture(opens in new tab), each Nomad node in a cluster resides on a separate EC2 instance, with shared or dedicated tenancy both being acceptable.
Post-Installation Tasks
After installation, perform these tasks to ensure everything is working as expected:
Verify cluster health:
nomad server members #lists all servers
nomad node status #lists all clients
For day 2 activities, such as observability, backups, etc., consult the Nomad Operating Guide(opens in new tab).