Introduction
HashiCorp Validated Designs (HVD) offer practitioners opinionated guidance for achieving production-grade deployments of Nomad Enterprise. These designs are purpose-built for delivering foundational Nomad use cases, with a baseline level of architectural and operational maturity. They draw on the field experiences of Solutions Engineers and Solutions Architects working with Nomad Enterprise customers.
This Solution Design Guide provides customers with access to an opinionated reference architecture, including key design decisions and the rationale behind them. Where applicable, the guide identifies modular design components that customers can adjust to align with organizational and/or regulatory requirements without compromising the overall integrity of the implementation. For customers deploying Nomad to the cloud, Terraform modules are also included, which can automate large portions of the infrastructure provisioning and software installation process.
Audience
This document is intended primarily for practitioners (including platform, networking, identity, and InfoSec teams) who want to deploy Nomad Enterprise clusters on-premises or on cloud infrastructure.
After using the Solution Design Guide to successfully deploy your Nomad Enterprise infrastructure, Nomad operators should refer to Nomad: Operating Guide. Using the Operating Guide, Nomad operators can learn how to integrate application platforms, configure machine and user access, centralize secrets, perform upgrades, and more.
Document structure
Document section | Purpose |
---|---|
Architecture | Describes the components that make up this Validated Design for Nomad Enterprise. For each component we will discuss their purpose within the architecture and detail low-level requirements and recommendations (configurations, sizing, etc.). |
Detailed design | Expands on the architecture to provide more detail on each component of the architecture. Carefully review this section to identify all technical and personnel requirements before moving on to implementation. |
Deploying Nomad Enterprise on premises, to bare metal or VMs | Guide and recommended practices on installing Nomad Enterprise in a self-managed bare metal or virtual environment. |
Deploying Nomad Enterprise in AWS on EC2 using Terraform | Installation of Nomad Enterprise using our validated Terraform modules. Currently we provide modules for installing Nomad on AWS on EC2. Modules for other providers and deployment options will be provided in the future. |
Supported platforms and versions
As of HVD Document Version 1.0, the Solution Design Guide supports deploying Nomad Enterprise on the following compute platforms:
- Bare-metal or VM-based servers in a private datacenter
- AWS on EC2 instances
This guide applies to Nomad Enterprise versions nomad_1.9.3+ent
and later, and can be downloaded from HashiCorp Releases(opens in new tab) or the HashiCorp Developer page(opens in new tab).
While Nomad Enterprise is officially supported for a number of different operating systems and processor architectures, this guide primarily uses linux/amd64
within its examples, unless otherwise specified.
Definitions and concepts
This documentation intentionally uses technology agnostic terminology. However, there are some terms which do not translate perfectly between providers. The following are the definitions of terms this document uses:
Term | Definition |
---|---|
Availability zone | An availability zone (or AZ) is a single network failure domain that hosts part or all of a Nomad cluster. Examples of availability zones include: - an individual datacenter - an air-gapped rack in a datacenter - an "Availability Zone" in AWS or Azure - a "Zone" in GCP |
Region | A physical location in a distinct geographic area containing one or more datacenters. |
Server node | A Nomad server node is a node that runs the Nomad process in server mode. Server nodes are responsible for managing the cluster state, scheduling jobs, and handling client requests. |
Leader node | The server node in a Nomad cluster that is responsible for managing the cluster state, scheduling jobs, and handling client requests. |
Follower/standby node | Server nodes that forward requests to the leader node. They also service read requests when Redundancy Zones are enabled (see below). Standby nodes vote to elect a new active node if the current one is unavailable after an interval threshold. |
Client node | A Nomad client node is a node that runs the Nomad process in client mode. Client nodes are responsible for running tasks and reporting their status back to the Nomad servers. |
Quorum | Quorum consists of a majority of voting nodes in a Nomad cluster: (N+1)/2, where N = the total number of voting nodes. For example, if there are 3 voting nodes, we would need 2 nodes to form a quorum. Quorum ensures the integrity and consistency of the cluster's state. If a quorum of nodes is unavailable for any reason, the cluster becomes unavailable, meaning no client requests can be processed until quorum is once again achieved. Note: workloads already scheduled will continue to run. |
Redundancy zone (RZ) | Redundancy zones(opens in new tab) are a feature of the Autopilot. A redundancy zone consists of multiple nodes in the same "zone" (user-defined), with Autopilot automatically keeping only one as voting, and the others as non-voting, per zone. This provides additional horizontal read scalability as well as increased fault tolerance for the Nomad cluster. If a voting node fails, the non-voting node in the same zone will be automatically promoted to a voter to preserve quorum. |
Cluster | Multiple Nomad servers nodes operating together in a high-availability mode. Each node contains an identical set of replicated data storage. |
Datacenter | A logical grouping of nodes in a Nomad cluster. This is an entirely user-defined and potentially arbitrary grouping, and doesn't have to reflect a physical datacenter. |
Node pools | A collection of nodes within a datacenter grouping, which can have custom scheduling parameters and which can be assigned to corresponding Nomad namespaces. Best used for grouping together nodes with special types, such as GPU, ARM-based, high performance, special networking, etc. |
Tokens | Tokens are the core method for authentication within Nomad. |
Organizational requirements
Most tasks within this guide will be owned by a single team, who is responsible for deploying shared infrastructure services. This is sometimes referred to as a platform team. However, a successful deployment of Nomad Enterprise requires collaboration across various functions within the organization. The platform team may interface with various other teams, such as networking, information security, or identity teams, to perform tasks such as allocating IP space, managing certificates, defining identity roles and permissions, or overseeing DNS hosted zones and records. In cases where the infrastructure resides on a public cloud, these functions might converge under a unified cloud team.
The primary platform team should thoroughly review the Solutions Design Guide in order to identify any teams they may be dependent on or may require approval from for the tasks required as part of the deployment. The platform team should appoint a project lead to oversee the deployment process and ensure effective communication with dependent teams and functions.