Manual Installation of Nomad Enterprise On-Premises

This guide provides detailed instructions for deploying Nomad Enterprise in on-premises environments, covering both virtual machine and bare metal installations. Before proceeding, ensure you have reviewed the Nomad Architecture(opens in new tab) and Deployment Requirements(opens in new tab).

Preparation

Server Requirements

For Nomad servers, hardware requirements will depend depending on the size of the cluster and the rate of change. As a rule of thumb, the following specifications are recommended based on cluster size:

Cluster Size	CPU Cores	Memory	Storage
Small (< 100 clients)	4-8 cores	16-32GB	50GB SSD
Medium (100-500 clients)	8-16 cores	32-64GB	100GB SSD
Large (500+ clients)	16-32 cores	64-128GB	200GB+ SSD

Nomad servers can be very IOPS intensive as it writes chunks of data from memory to storage. Ensure that the storage subsystem can handle the write load (SSDs are a must), and be sure to monitor disk latency and throughput.

For the continued availability of the Nomad cluster, there should be 5-6 servers in the cluster. Nomad uses the Raft consensus algorithm, so you should ensure that there's always a quorum. The best way to achieve this depends a bit on the available network and datacenter topology:

if there are more than 3 failure domains (datacenters close to one another (< 10ms latency), independent datacenter rooms or racks, etc), you should deploy 6 servers and make use of the Redundancy zones(opens in new tab) feature, which ensures that there are the correct number of voting servers in each failure domain, and increases the fault tolerance and performance of the cluster.
if there only 2 failure domains (2 datacenters), which makes it impossible to deploy servers in such a way as to ensure there is always quorum maintained, you should consider either deploying the servers in only one of the failure domains, or deploying two clusters of 5 servers each, federated together(opens in new tab).
if there aren't any clear failure domains (single datacenter), you should deploy 5 servers

Client Requirements

Client hardware requirements vary significantly based on workload types. Generally, unless you are running in an Edge scenario, avoid having very small client sizes. As a rule of thumb, aim for at least 4 cores and 8GB of memory per client, adapting as needed based on workload requirements and distribution. Clients don't have to be located in the same datacenter(s) as the servers, and can be connected over unreliable networks (in which case the behaviour of workloads can be tuned with the disconnect(opens in new tab) block).

Nomad clients can run on virtual machines or bare metal servers, depending on your organization's requirements. Starting with Nomad clients as virtual machines is a common approach, but as the cluster grows and the number of workloads increases, consider moving to bare metal for better performance, resource utilization and to avoid having two orchestrators trying to optimize the use of the same underlying hardware.Consider reserving resources on the hypervisor level for virtual machine Nomad clients to ensure that there is enough physical capacity to run the workloads they are assigned. .

Network Requirements

Nomad requires a few ports for internal communication, bidirectionally. The defaults for those are:

Port	Name	Purpose	Protocol
4646	HTTP API	Used by clients and servers to serve the HTTP API	TCP only
4647	RPC	Used for internal RPC communication between client agents and servers, and for inter-server traffic	TCP only
4648	Serf WAN	Used by servers to gossip both over the LAN and WAN to other servers. Not required for Nomad clients to reach this address	TCP and UDP

For optimal performance, ensure:

Low-latency connectivity between servers (< 10ms round-trip time)
Sufficient bandwidth for job artifacts and container images
Network segmentation between server and client traffic

Certificates

Create a standard X.509 certificate that will be installed on the Nomad servers. Follow your organization’s process for creating a new certificate that matches the DNS record you intend to use for accessing Nomad. This guide will be assuming the conventions and using self signed certificates from the Nomad Architecture(opens in new tab) and Enable TLS tutorial(opens in new tab).

You will need a total of three files:

CA Public certificate (nomad-agent-ca.pem)
Server node certificate’s private key: (global-server-nomad-key.pem)
Server node public certificate: (global-server-nomad.pem)

These certificates will be distributed to the server nodes in a later section and used to secure communications between Nomad clients, servers, and the API.

Additionally, a fourth certificate and key was generated: global-cli-nomad.pem & global-cli-nomad-key.pem. These will be used later on this guide to interact with the cluster. The global- prefix for the certificate and key assume these are for multi-region environments. You are able to generate certificates on a per-region basis, however In this guide, we are using a single certificate and key that is assumed to be used across all clusters globally.

Note

Ensure the certificate’s Common Name (CN) or Subject Alternative Names (SANs) include the DNS names you’ll use to access Nomad, such as `nomad.yourdomain.com`. If you’re using separate DNS names for server-to-server communication, include these in the SANs as well.

While Nomad's TLS configuration will be production ready, key management and rotation is a complex subject not covered by this guide. Vault(opens in new tab) is the suggested solution for key generation and management.

Load Balancing the API and UI

To load balance the UI and API of the servers (either for human/machine access, CI/CD pipeline consumption, or for initial discovery to join the cluster), you can use any load balancer, in layer 4 or layer 7 mode (caveat: the web UI's exec into an allocation uses websockets, so if you need it for debugging purposes, your load balancer must allow for that). Use the /v1/agent/health endpoint for health checking. TLS termination can be done at the load balancer to allow for full mTLS with client verification to be enabled without impacting human users connecting to the web UI.

Auto-join

The server_join configuration option is used to specify the initial set of Nomad servers to join (for both servers to form the cluster, and clients to join it). This can be a list of IP addresses or DNS names of the servers or a load balancer in front of them, or a cloud auto-join config(opens in new tab). There are Cloud Auto-join providers for a number of on-premises orchestrators (such as VMware vSphere, OpenStack), as well as mDNS. At least one of the provided IPs/DNS names/Cloud Auto-join providers must be reached for the Nomad agent to join the cluster, and once that happens all other servers will be automatically discovered, so it's not an issue if some of the servers are unreachable.

Servers:

server {
  enabled          = true
  bootstrap_expect = 3
  server_join {
    retry_join     = [ "1.1.1.1", "2.2.2.2" ]
  }
}

Clients:

client {
    enabled = true
    server_join {
        retry_join     = [ "1.1.1.1", "2.2.2.2" ]
    }
}

Obtain the Nomad Enterprise License File

Obtain the Nomad Enterprise License File from your HashiCorp account team. This file contains a license key unique to your environment. The file will be named something like nomad.hclic.

Keep this file handy, as you will need it later in the installation process.

Download and Install the Nomad CLI

To interact with your Nomad cluster, you’ll need to install the Nomad binary. Follow the steps for your OS on the Install Nomad(opens in new tab) documentation page.

Deployment Process

Infrastructure bootstrapping

Create the necessary infrastructure for your Nomad cluster, with the tooling available at your disposal best suited for the job - e.g. for bare metal machines that would be PXE boot or an automated provisioning system such as Canonical MaaS or Tinkerbell; for virtual machines that would be Terraform. Ensure that the servers have the necessary network connectivity and storage available, and set up any related infrastructure (such as load balancers, firewall and switch configs, DNS entries, etc).

The following steps should be automated as much as possible with your preferred config management + image management tools, to ensure consistency and repeatability.

Download and install Nomad

NOMAD_VERSION="1.9.1+ent"
wget https://releases.hashicorp.com/nomad/${NOMAD_VERSION}/nomad_${NOMAD_VERSION}_linux_amd64.zip
unzip nomad_${NOMAD_VERSION}_linux_amd64.zip
sudo mv nomad /usr/local/bin/

Create necessary directories

sudo mkdir -p /opt/nomad/data
sudo mkdir -p /etc/nomad.d
sudo mkdir -p /etc/nomad.d/tls

Copy TLS certificates

Copy your TLS certificates to /etc/nomad.d/tls/:

sudo scp nomad-agent-ca.pem global-server-nomad-key.pem global-server-nomad.pem ubuntu@server-machine:/etc/nomad.d/tls/

Copy License File

echo "02MV4UU43BK5..." >> /etc/nomad.d/license.hclic

Create Gossip Encryption Key

nomad operator gossip keyring generate

Save the output of this key to use in the configuration file.

The `nomad operator gossip keyring generate` command returns 16 bytes; however, Nomad supports gossip encryption keys of 32 bytes as well. Supplying your own 32 byte key enables AES-256 mode, where supplying a 16 byte key enables AES-128 mode.

Create Nomad configuration file

Create /etc/nomad.d/nomad.hcl with the following content or supply your own. The example below includes:

Autopilot(opens in new tab) which handles automated upgrades
Redundancy zones(opens in new tab): Nomad will then use these values to partition the servers by redundancy zone/failure domain, and will aim to keep one voting server per zone. Extra servers in each zone will stay as non-voters on standby to be promoted if the active voter leaves or dies.
TLS:(opens in new tab) enables TLS between nodes and for the API. At scale, it's recommended to use HashiCorp Vault(opens in new tab), however for starting out, it is acceptable to leverage signed certificates copied to the machine.

data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0" #CHANGE ME to listen only on the required IPs

server {
  enabled = true
  bootstrap_expect = 3
  server_join {
    retry_join     = [ "1.1.1.1", "2.2.2.2" ] #CHANGE ME
  }
  redundancy_zone = "dc1-rack-b51" #CHANGE ME
  license_path = "/etc/nomad.d/license.hclic"
  encrypt = "YOUR-GOSSIP-KEY" #Paste your key from the above "Create Gossip Encryption Key" step
}

acl {
  enabled = true
}

tls {
  http = true
  rpc  = true
  ca_file   = "/etc/nomad.d/tls/nomad-agent-ca.pem"
  cert_file = "/etc/nomad.d/tls/global-server-nomad-key.pem"
  key_file  = "/etc/nomad.d/tls/global-server-nomad.pem"
}

autopilot {
  cleanup_dead_servers      = true
  last_contact_threshold    = "200ms"
  max_trailing_logs         = 250
  server_stabilization_time = "10s"
  enable_redundancy_zones   = true
  disable_upgrade_migration = false
  enable_custom_upgrades    = false
}

If you are using a non-default scheduler configuration, ensure these settings are set before starting Nomad for the first time, otherwise they will need to be set via API.

Starting Nomad

Set up Nomad service

sudo nano /etc/systemd/system/nomad.service

[Unit]
Description=Nomad
Documentation=https://www.nomadproject.io/docs/
Wants=network-online.target
After=network-online.target
 
[Service]
 
# Nomad servers should be run as the nomad user. 
# Nomad clients should be run as root
User=root
Group=root
 
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/local/bin/nomad agent -config /etc/nomad.d
KillMode=process
KillSignal=SIGINT
LimitNOFILE=65536
LimitNPROC=infinity
Restart=on-failure
RestartSec=2
 
TasksMax=infinity
OOMScoreAdjust=-1000
 
[Install]
WantedBy=multi-user.target

Start the Nomad service

sudo systemctl enable nomad
sudo systemctl start nomad

Validate the cluster is bootstrapped and functioning

Verify you have the expected amount of servers with the following commands:

nomad server members
nomad operator raft list-peers

If you are running into issues with your cluster bootstrapping or the Nomad agent starting, you can view journal logs to reveal any errors:

sudo systemctl status nomad
sudo journalctl -xe

Typically issues are due to invalid configuration entries within nomad.hcl, certificate, or networking issues where the nodes cannot communicate with each other.

Initialize ACL system

Log into one of your servers and run the following commands:

Set environment variables

export NOMAD_ADDR=https://127.0.0.1:4646
export NOMAD_CACERT=/etc/nomad.d/tls/nomad-agent-ca.pem
export NOMAD_CLIENT_CERT=/etc/nomad.d/tls/global-server-nomad.pem
export NOMAD_CLIENT_KEY=/etc/nomad.d/tls/global-server-nomad-key.pem

Bootstrap the ACL system. This only needs to be done on one server. The ACL's will sync across server nodes.

nomad acl bootstrap

Save the Accessor ID and Secret ID and follow the Identity and Access Management(opens in new tab) section of the Nomad Operating Guide for more information on ACL's and post installation tasks.

Interact with your Nomad Cluster

export NOMAD_ADDR=https://nomad-lb
export NOMAD_TOKEN=YourSecretID 
export NOMAD_CACERT=./nomad-agent-ca.pem
export NOMAD_CLIENT_CERT=./global-cli-nomad.pem
export NOMAD_CLIENT_KEY=./global-cli-nomad-key.pem
nomad -autocomplete-install && complete -C /usr/bin/nomad nomad

Client Installation

The client installation process is similar to the server installation, with a few key differences in the configuration. Nomad clients are responsible for running tasks and jobs scheduled by the Nomad servers. They register with the server cluster, receive work assignments, and execute tasks.

Client TLS Configuration

Nomad clients should use a different certificate than the server nodes. From the same directory as the CA certificate you generated during the server installation steps:

nomad tls cert create -client
This will generate:
- global-client-nomad.pem (Client certificate)
- global-client-nomad-key.pem (Client private key)
Copy certificates to the client machines:

scp nomad-agent-ca.pem global-client-nomad.pem global-client-nomad-key.pem ubuntu@client-machine:/etc/nomad.d/tls/

Remember to keep your private keys secure and implement proper certificate management practices, including regular rotation and secure distribution of certificates to your Nomad clients.

In your nomad.hcl file:

# Base configuration
data_dir = "/opt/nomad/data"
bind_addr = "0.0.0.0" #CHANGE ME to listen only on the required IPs

# Client configuration
client {
  enabled = true
  
  # Server join configuration
    server_join {
        retry_join     = [ "1.1.1.1", "2.2.2.2" ]
    }
  
  # Node pool configuration
  # Optionally configure a node pool for this client to join
  # node_pool = "general" 

}

# Security configuration
tls {
  http = true
  rpc  = true
  verify_server_hostname = true
  ca_file   = "/etc/nomad.d/tls/nomad-agent-ca.pem"
  cert_file = "/etc/nomad.d/tls/client.pem"
  key_file  = "/etc/nomad.d/tls/client-key.pem"
}

# Telemetry configuration
telemetry {
  publish_allocation_metrics = true
  publish_node_metrics = true
  prometheus_metrics = true
}

Post-Installation Tasks

After installation, perform these tasks to ensure everything worked as expected:

Verify cluster health:

nomad server members #lists all servers
nomad node status #lists all clients

For day 2 activities, such as observability, backups, etc., consult the Nomad Operating Guide(opens in new tab).

Cloud deployment

Credits