Manual Installation of Nomad Enterprise On-Premises
This guide provides detailed instructions for deploying Nomad Enterprise in on-premises environments, covering both virtual machine and bare metal installations. Before proceeding, ensure you have reviewed the Nomad Architecture(opens in new tab) and Deployment Requirements(opens in new tab).
Preparation
Server Requirements
For Nomad servers, hardware requirements will depend depending on the size of the cluster and the rate of change. As a rule of thumb, the following specifications are recommended based on cluster size:
Cluster Size | CPU Cores | Memory | Storage |
---|---|---|---|
Small (< 100 clients) | 4-8 cores | 16-32GB | 50GB SSD |
Medium (100-500 clients) | 8-16 cores | 32-64GB | 100GB SSD |
Large (500+ clients) | 16-32 cores | 64-128GB | 200GB+ SSD |
Nomad servers can be very IOPS intensive as it writes chunks of data from memory to storage. Ensure that the storage subsystem can handle the write load (SSDs are a must), and be sure to monitor disk latency and throughput.
For the continued availability of the Nomad cluster, there should be 5-6 servers in the cluster. Nomad uses the Raft consensus algorithm, so you should ensure that there's always a quorum. The best way to achieve this depends a bit on the available network and datacenter topology:
if there are more than 3 failure domains (datacenters close to one another (< 10ms latency), independent datacenter rooms or racks, etc), you should deploy 6 servers and make use of the Redundancy zones(opens in new tab) feature, which ensures that there are the correct number of voting servers in each failure domain, and increases the fault tolerance and performance of the cluster.
if there only 2 failure domains (2 datacenters), which makes it impossible to deploy servers in such a way as to ensure there is always quorum maintained, you should consider either deploying the servers in only one of the failure domains, or deploying two clusters of 5 servers each, federated together(opens in new tab).
if there aren't any clear failure domains (single datacenter), you should deploy 5 servers
Client Requirements
Client hardware requirements vary significantly based on workload types. Generally, unless you are running in an Edge scenario, avoid having very small client sizes. As a rule of thumb, aim for at least 4 cores and 8GB of memory per client, adapting as needed based on workload requirements and distribution. Clients don't have to be located in the same datacenter(s) as the servers, and can be connected over unreliable networks (in which case the behaviour of workloads can be tuned with the disconnect(opens in new tab) block).
Network Requirements
Nomad requires a few ports for internal communication, bidirectionally. The defaults for those are:
Port | Name | Purpose | Protocol |
---|---|---|---|
4646 | HTTP API | Used by clients and servers to serve the HTTP API | TCP only |
4647 | RPC | Used for internal RPC communication between client agents and servers, and for inter-server traffic | TCP only |
4648 | Serf WAN | Used by servers to gossip both over the LAN and WAN to other servers. Not required for Nomad clients to reach this address | TCP and UDP |
For optimal performance, ensure:
Low-latency connectivity between servers (< 10ms round-trip time)
Sufficient bandwidth for job artifacts and container images
Network segmentation between server and client traffic
Certificates
Create a standard X.509 certificate that will be installed on the Nomad servers. Follow your organization’s process for creating a new certificate that matches the DNS record you intend to use for accessing Nomad. This guide will be assuming the conventions and using self signed certificates from the Nomad Architecture(opens in new tab) and Enable TLS tutorial(opens in new tab).
You will need a total of three files:
- CA Public certificate (
nomad-agent-ca.pem
) - Server node certificate’s private key: (
global-server-nomad-key.pem
) - Server node public certificate: (
global-server-nomad.pem
)
These certificates will be distributed to the server nodes in a later section and used to secure communications between Nomad clients, servers, and the API.
Additionally, a fourth certificate and key was generated: global-cli-nomad.pem
& global-cli-nomad-key.pem
. These will be used later on this guide to interact with the cluster. The global-
prefix for the certificate and key assume these are for multi-region environments. You are able to generate certificates on a per-region basis, however In this guide, we are using a single certificate and key that is assumed to be used across all clusters globally.
Note
Ensure the certificate’s Common Name (CN) or Subject Alternative Names (SANs) include the DNS names you’ll use to access Nomad, such as `nomad.yourdomain.com`. If you’re using separate DNS names for server-to-server communication, include these in the SANs as well.While Nomad's TLS configuration will be production ready, key management and rotation is a complex subject not covered by this guide. Vault(opens in new tab) is the suggested solution for key generation and management.
Load Balancing the API and UI
To load balance the UI and API of the servers (either for human/machine access, CI/CD pipeline consumption, or for initial discovery to join the cluster), you can use any load balancer, in layer 4 or layer 7 mode (caveat: the web UI's exec
into an allocation uses websockets, so if you need it for debugging purposes, your load balancer must allow for that). Use the /v1/agent/health
endpoint for health checking. TLS termination can be done at the load balancer to allow for full mTLS with client verification to be enabled without impacting human users connecting to the web UI.
Auto-join
The server_join
configuration option is used to specify the initial set of Nomad servers to join (for both servers to form the cluster, and clients to join it). This can be a list of IP addresses or DNS names of the servers or a load balancer in front of them, or a cloud auto-join config(opens in new tab). There are Cloud Auto-join providers for a number of on-premises orchestrators (such as VMware vSphere, OpenStack), as well as mDNS. At least one of the provided IPs/DNS names/Cloud Auto-join providers must be reached for the Nomad agent to join the cluster, and once that happens all other servers will be automatically discovered, so it's not an issue if some of the servers are unreachable.
Servers:
server {
enabled = true
bootstrap_expect = 3
server_join {
retry_join = [ "1.1.1.1", "2.2.2.2" ]
}
}
Clients:
client {
enabled = true
server_join {
retry_join = [ "1.1.1.1", "2.2.2.2" ]
}
}
Obtain the Nomad Enterprise License File
Obtain the Nomad Enterprise License File from your HashiCorp account team. This file contains a license key unique to your environment. The file will be named something like nomad.hclic
.
Keep this file handy, as you will need it later in the installation process.
Download and Install the Nomad CLI
To interact with your Nomad cluster, you’ll need to install the Nomad binary. Follow the steps for your OS on the Install Nomad(opens in new tab) documentation page.
Deployment Process
Infrastructure bootstrapping
Create the necessary infrastructure for your Nomad cluster, with the tooling available at your disposal best suited for the job - e.g. for bare metal machines that would be PXE boot or an automated provisioning system such as Canonical MaaS or Tinkerbell; for virtual machines that would be Terraform. Ensure that the servers have the necessary network connectivity and storage available, and set up any related infrastructure (such as load balancers, firewall and switch configs, DNS entries, etc).
The following steps should be automated as much as possible with your preferred config management + image management tools, to ensure consistency and repeatability.
Create necessary directories
sudo mkdir -p /opt/nomad/data
sudo mkdir -p /etc/nomad.d
sudo mkdir -p /etc/nomad.d/tls
Copy TLS certificates
Copy your TLS certificates to /etc/nomad.d/tls/
:
sudo scp nomad-agent-ca.pem global-server-nomad-key.pem global-server-nomad.pem ubuntu@server-machine:/etc/nomad.d/tls/
Copy License File
echo "02MV4UU43BK5..." >> /etc/nomad.d/license.hclic
Create Gossip Encryption Key
nomad operator gossip keyring generate
Save the output of this key to use in the configuration file.
Create Nomad configuration file
Create /etc/nomad.d/nomad.hcl
with the following content or supply your own. The example below includes:
- Autopilot(opens in new tab) which handles automated upgrades
- Redundancy zones(opens in new tab): Nomad will then use these values to partition the servers by redundancy zone/failure domain, and will aim to keep one voting server per zone. Extra servers in each zone will stay as non-voters on standby to be promoted if the active voter leaves or dies.
- TLS:(opens in new tab) enables TLS between nodes and for the API. At scale, it's recommended to use HashiCorp Vault(opens in new tab), however for starting out, it is acceptable to leverage signed certificates copied to the machine.
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0" #CHANGE ME to listen only on the required IPs
server {
enabled = true
bootstrap_expect = 3
server_join {
retry_join = [ "1.1.1.1", "2.2.2.2" ] #CHANGE ME
}
redundancy_zone = "dc1-rack-b51" #CHANGE ME
license_path = "/etc/nomad.d/license.hclic"
encrypt = "YOUR-GOSSIP-KEY" #Paste your key from the above "Create Gossip Encryption Key" step
}
acl {
enabled = true
}
tls {
http = true
rpc = true
ca_file = "/etc/nomad.d/tls/nomad-agent-ca.pem"
cert_file = "/etc/nomad.d/tls/global-server-nomad-key.pem"
key_file = "/etc/nomad.d/tls/global-server-nomad.pem"
}
autopilot {
cleanup_dead_servers = true
last_contact_threshold = "200ms"
max_trailing_logs = 250
server_stabilization_time = "10s"
enable_redundancy_zones = true
disable_upgrade_migration = false
enable_custom_upgrades = false
}
Starting Nomad
Set up Nomad service
sudo nano /etc/systemd/system/nomad.service
[Unit]
Description=Nomad
Documentation=https://www.nomadproject.io/docs/
Wants=network-online.target
After=network-online.target
[Service]
# Nomad servers should be run as the nomad user.
# Nomad clients should be run as root
User=root
Group=root
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/bin/nomad agent -config /etc/nomad.d
KillMode=process
KillSignal=SIGINT
LimitNOFILE=65536
LimitNPROC=infinity
Restart=on-failure
RestartSec=2
TasksMax=infinity
OOMScoreAdjust=-1000
[Install]
WantedBy=multi-user.target
Start the Nomad service
sudo systemctl enable nomad
sudo systemctl start nomad
Validate the cluster is bootstrapped and functioning
Verify you have the expected amount of servers with the following commands:
nomad server members
nomad operator raft list-peers
If you are running into issues with your cluster bootstrapping or the Nomad agent starting, you can view journal logs to reveal any errors:
sudo systemctl status nomad
sudo journalctl -xe
Typically issues are due to invalid configuration entries within nomad.hcl
, certificate, or networking issues where the nodes cannot communicate with each other.
Initialize ACL system
Log into one of your servers and run the following commands:
Set environment variables
export NOMAD_ADDR=https://127.0.0.1:4646
export NOMAD_CACERT=/etc/nomad.d/tls/nomad-agent-ca.pem
export NOMAD_CLIENT_CERT=/etc/nomad.d/tls/global-server-nomad.pem
export NOMAD_CLIENT_KEY=/etc/nomad.d/tls/global-server-nomad-key.pem
Bootstrap the ACL system. This only needs to be done on one server. The ACL's will sync across server nodes.
nomad acl bootstrap
Save the Accessor ID and Secret ID and follow the Identity and Access Management(opens in new tab) section of the Nomad Operating Guide for more information on ACL's and post installation tasks.
Interact with your Nomad Cluster
export NOMAD_ADDR=https://nomad-lb
export NOMAD_TOKEN=YourSecretID
export NOMAD_CACERT=./nomad-agent-ca.pem
export NOMAD_CLIENT_CERT=./global-cli-nomad.pem
export NOMAD_CLIENT_KEY=./global-cli-nomad-key.pem
nomad -autocomplete-install && complete -C /usr/bin/nomad nomad
Client Installation
The client installation process is similar to the server installation, with a few key differences in the configuration. Nomad clients are responsible for running tasks and jobs scheduled by the Nomad servers. They register with the server cluster, receive work assignments, and execute tasks.
Client TLS Configuration
Nomad clients should use a different certificate than the server nodes. From the same directory as the CA certificate you generated during the server installation steps:
nomad tls cert create -client
This will generate:
global-client-nomad.pem
(Client certificate)global-client-nomad-key.pem
(Client private key)
Copy certificates to the client machines:
scp nomad-agent-ca.pem global-client-nomad.pem global-client-nomad-key.pem ubuntu@client-machine:/etc/nomad.d/tls/
Remember to keep your private keys secure and implement proper certificate management practices, including regular rotation and secure distribution of certificates to your Nomad clients.
In your nomad.hcl
file:
# Base configuration
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0" #CHANGE ME to listen only on the required IPs
# Client configuration
client {
enabled = true
# Server join configuration
server_join {
retry_join = [ "1.1.1.1", "2.2.2.2" ]
}
# Node pool configuration
# Optionally configure a node pool for this client to join
# node_pool = "general"
}
# Security configuration
tls {
http = true
rpc = true
verify_server_hostname = true
ca_file = "/etc/nomad.d/tls/nomad-agent-ca.pem"
cert_file = "/etc/nomad.d/tls/client.pem"
key_file = "/etc/nomad.d/tls/client-key.pem"
}
# Telemetry configuration
telemetry {
publish_allocation_metrics = true
publish_node_metrics = true
prometheus_metrics = true
}
Post-Installation Tasks
After installation, perform these tasks to ensure everything worked as expected:
Verify cluster health:
nomad server members #lists all servers
nomad node status #lists all clients
For day 2 activities, such as observability, backups, etc., consult the Nomad Operating Guide(opens in new tab).