Nomad - Guide for workload modernization with Traefik API gateway, Consul and Vault
HashiCorp products | Nomad, Consul, Vault |
---|---|
Partner products | Traefik Enterprise - API gateway |
Maturity model | Standardizing |
Use case coverage | Workload modernization |
Guide type | HVD integration guide |
Tags | Workload modernization, API gateway |
Publish date | Oct/2024 |
Authors | Shaun Empie [Traefik], Adrian Todorov [HashiCorp] |
Purpose of this guide
This workload modernization integration guide with Traefik Enterprise embeds a feature-rich, enterprise-grade API Gateway into Hashicorp Nomad clusters running Consul and Vault, to seamlessly and securely publish services.
Target audience
- Operations teams teams responsible for API gateways.
- Operations teams responsible for HashiCorp Vault, Nomad, and Consul.
- Security teams responsible for centralizing access control (authentication and authorization).
Benefits
- Use Traefik Enterprise as an API gateway for a Nomad cluster.
- Leverage Traefik Enterprise's Consul catalog provider for dynamic service configuration and discovery.
- Manage TLS certificates on the fly with Traefik Enterprise's Vault PKI integration and KV store.
- Deploy distributed rate-limiting and distributed certificates across all proxies.
Prerequisites
- Nomad cluster running with Docker as the runtime environment.
- Consul agent deployments across all nodes.
- Vault server deployed and unsealed.
Integration architecture
Prescriptive guidance
Our objective in this validated design is to give you prescriptive, best practice guidance based on our experience partnering with numerous organizations who have implemented Traefik, Nomad, Consul, and Vault together.
People and process considerations
Operations teams (API gateways):
- Design and maintain Traefik configuration.
- Integrate Traefik with Consul for service discovery and Vault for secret management.
Operations teams (HashiCorp suite):
- Deploy and manage Vault, Nomad, and Consul clusters.
- Establish secure configurations and best practices for each tool (refer to the relevant HashiCorp Validated Designs for each tool).
Security teams (authentication and authorization):
- Define and enforce access control policies across the stack.
- Work with Platform teams to implement authentication and authorization in Vault.
Step 1: Prepare Vault for Traefik Enterprise job
Task: Store secrets in Vault
Persona: Security team
Description: Store all variables and secrets in Vault to secure and automate the installation process.
Enable the KV secrets engine in Vault.
vault secrets enable -path=secret kv-v2
Export the Traefik Enterprise license key and store it in Vault.
export TRAEFIKEE_LICENSE= vault kv put secret/traefikee/license license_key="${TRAEFIKEE_LICENSE}"
Generate and store a Traefik plugin registry token into Vault.
vault kv put secret/traefikee/plugin token=$(openssl rand -base64 10)
Enable the PKI secrets engine.
vault secrets enable pki
Generate a PKI root certificate.
vault write pki/root/generate/internal common_name="VAULT PKI CERT"
Export the allowed domains (ex.
*.nomad.traefik.io
) and create a Vault role to allow Traefik to request a certificate from Vault for these domains.export WILDCARD_DOMAIN= vault write pki/roles/traefikee allowed_domains=${WILDCARD_DOMAIN} allow_glob_domains=true max_ttl=10h
In this example, the role
traefikee
is allowed to request a certificate for the demo application whoami. This role will be referenced as part of Traefik static-config to allow it to communicate with Vault to request a certificate for this domain.Create a policy to allow Traefik to interact with a Vault PKI engine using token authentication.
pki-acl.hcl
path "pki/issue/traefikee*" { capabilities = ["create", "read", "update"] }
vault policy write traefikee-pki-policy pki-acl.hcl
Ensure that Nomad Workload Identities and Vault ACL are configured.
Create a JWT authentication role to allow Traefik to authenticate to Vault.
vault-auth-role-traefikee.json
{ "role_type": "jwt", "bound_audiences": ["vault.io"], "user_claim": "/nomad_job_id", "user_claim_json_pointer": true, "claim_mappings": { "nomad_namespace": "nomad_namespace", "nomad_job_id": "nomad_job_id", "nomad_task": "nomad_task" }, "token_type": "service", "token_policies": ["traefikee-pki-policy", "traefikee-access-secret"], "token_period": "1h", "token_explicit_max_ttl": 0 }
vault write auth/jwt-nomad/role/traefikee '@vault-auth-role-traefikee.json'
Create an AppRole to allow Traefik to authenticate with Vault (specifically for the static.yaml config).
vault auth enable approle vault write auth/approle/role/traefikee \ token_policies="traefikee-pki-policy,traefikee-access-secret" \ token_ttl=1h \ token_max_ttl=24h \ secret_id_ttl=24h vault read auth/approle/role/traefikee/role-ID vault write -f auth/approle/role/traefikee/secret-id
Step 2: Create a persistent volume
Task: Set up persistent storage for Traefik Enterprise deployment
Persona: Operation team
Description: Traefik Enterprise Controller requires persistent volume to store its data. This can be achieved using CSI storage plugins or by pre-selecting the nodes that will be running the controller and binding a directory from the host as a volume.
In this example, you will use the host volume approach. To do so, you need to create a directory on the node before you expose it to Nomad:
Create a directory on the Nomad nodes for the Traefik Controller and Plugin Registry.
mkdir -p /opt/traefikee /opt/traefikee-plugins
Modify the Nomad client configuration to mount the directory as a volume in Nomad. Be sure to restart nomad services so Nomad can pick up the changes.
client { ..snip.. host_volume "traefikee-data" { path = "/opt/traefikee" read_only = false } host_volume "traefikee-plugins" { path = "/opt/traefikee-plugins" read_only = false } ..snip.. }
Step 3: Deploy Traefik Enterprise
Task: Deploy Traefik Enterprise
Persona: Operation team
Description: Deploy Traefik Enterprise specification file on Nomad cluster.
Create a namespace for Traefik Enterprise.
nomad namespace apply -description "Traefik Enterprise Namespace" traefikee
Create a Vault access policy to allow Traefik to access specific secrets.
traefik-access-secret.hcl
path "secret/data/traefikee/license" { capabilities = ["read"] } path "secret/data/traefikee/plugin" { capabilities = ["read"] } path "secret/data/traefikee/prxjointoken" { capabilities = ["read"] } path "secret/data/traefikee/static_config" { capabilities = ["read"] }
vault policy write traefikee-access-secret traefikee-access-secret.hcl
Deploy the Traefik Enterprise Controller specification file.
traefikee-controllers.hcl
job "traefikee-controllers" { datacenters = ["*"] namespace = "traefikee" type = "service" ## Update one at a time, with 30s between each to give them time to stabilize update { stagger = "30s" max_parallel = 1 } ## Controller Task Group definition group "controllers" { count = 1 network { ## Port used by controllers to advertise and exchange data on the cluster. port "control" { static = 4242 to = 4242 } ## Port used by teectl to connect to the cluster API. port "api" { static = 55055 to = 55055 } } ## Persistent volume to store the cluster state, must already exist in the Host if you are binding a directory from the host. volume "data" { type = "host" read_only = false source = "traefikee-data" } ## Service registered in consul service { name = "traefikee-controllers" port = "control" task = "controllers" provider = "consul" check { type = "tcp" port = "control" interval = "30s" timeout = "5s" failures_before_critical = "2" } } task "controllers" { driver = "docker" vault { role = "traefikee" } ## Volume mount location is related to the --statedir flag. volume_mount { volume = "data" destination = "/data" } template { data = <<EOH TRAEFIKEE_LICENSE="{{ with secret "secret/data/traefikee/license" }}{{.Data.data.license_key}}{{end}}" TRAEFIKEE_PLUGIN_TOKEN="{{ with secret "secret/data/traefikee/plugin" }}{{.Data.data.token}}{{end}}" CTRL_ADDR={{ range $index, $service := service "traefikee-controllers" }}{{ if ne $index 0 }},{{ end }}{{ .Address }}:4242{{ end }} EOH destination = "secrets/traefikee.env" env = true change_mode = "restart" splay = "30s" ## Up to 30s random delay to avoid all proxies restarting at the same time. } config { image = "traefik/traefikee:v2.11.7" network_mode = "bridge" cap_add = ["NET_BIND_SERVICE"] ports = [ "control", "api", ] args = [ "controller", "--name=${NOMAD_ALLOC_NAME}", "--advertise=${NOMAD_ADDR_control}", "--discovery.static.peers=${CTRL_ADDR}", "--license=${TRAEFIKEE_LICENSE}", "--statedir=/data", "--jointoken.file.path=/data/tokens", "--api.autocerts", "--socket=${NOMAD_TASK_DIR}/cluster.sock", "--api.socket=${NOMAD_TASK_DIR}/api.sock", "--log.level=INFO", "--log.filepath=/alloc/logs/traefikee.log", "--log.format=", ] } resources { cpu = 500 ## MHz memory = 256 ## MB } } } }
nomad run traefikee.hcl
Generate the joint token manually from the controller and store it as the variable
TRAEFIKEE_PROXY_TOKEN
.TRAEFIKEE_PROXY_TOKEN=$(nomad alloc exec -namespace traefikee -i=false -t=false -task controllers $(nomad job status -namespace traefikee traefikee | grep controller | grep running | awk '{ print $1 }') /traefikee tokens --socket local/cluster.sock | grep TRAEFIKEE_PROXY_TOKEN | sed 's/export TRAEFIKEE_PROXY_TOKEN=//')
This token will be required by proxy instanced to connect to the controller.
Store the joint token in Vault so it can be retrieved by the proxies during deployment.
vault kv put secret/traefikee/prxjointoken proxy_join_token=$TRAEFIKEE_PROXY_TOKEN
Deploy the Traefik Enterprise proxy specification file.
traefikee-proxies.hcl
job "traefikee-proxies" { datacenters = ["*"] namespace = "traefikee" type = "system" update { stagger = "30s" max_parallel = 1 } ## Proxy Task Group Definition group "proxies" { count = 1 network = { ## Using "host" mode on proxies to expose the ports directly in the host and enable external load balancers mode = "host" ## Port used by distributed features port "distributed" { static = 8484 } ## Traefik entryPoints port "web" { static = 80 to = 80 } port "websecure" { static = 443 to = 443 } ## Add any other entrypoints listed in the static config here and in the 'ports' map for the task driver. # port "myentrypoint" {} } ## Service registered in Consul service { name = "traefikee-proxies" port = "web" task = "proxies" provider = "consul" } task "proxies" { driver = "docker" vault { role = "traefikee" } kill_timeout = "30s" ## Place the token from a Vault secret into an environment variable template { data = <<EOF CTRL_ADDR={{ range service "traefikee-controllers" }}{{ .Address }},{{ end }} PROXY_JOIN_TOKEN="{{ with secret "secret/data/traefikee/prxjointoken" }}{{.Data.data.proxy_join_token}}{{end}}" EOF destination = "secrets/traefikee.env" env = true change_mode = "restart" splay = "30s" ## Up to 30s random delay to avoid all proxies restarting at the same time. } config { image = "traefik/traefikee:v2.11.7" ports = [ "web", "websecure", "distributed", ] args = [ "proxy", "--discovery.static.peers=${CTRL_ADDR}:4242", "--jointoken.value=${PROXY_JOIN_TOKEN}", "--log.level=", "--log.filepath=", "--log.format=", ] cap_drop = [ "ALL", ] cap_add = [ "NET_BIND_SERVICE", ] } resources { cpu = 500 ## MHz memory = 256 ## MB } } } }
Deploy the Traefik Enterprise plugin registry specification file.
traefikee-registry.hcl
job "traefikee-registry" { datacenters = ["*"] namespace = "traefikee" type = "service" ## Plugin Registry Task Group Definition group "traefikee-registry" { network { mode = "bridge" port "https" { static = 8443 } } service { name = "traefikee-registry" port = "https" task = "registry" provider = "consul" } count = 1 volume "plugins" { type = "host" read_only = false source = "traefikee-plugins" } task "registry" { driver = "docker" vault { role = "traefikee" } template { data = <<EOH TRAEFIKEE_JOIN_TOKEN="{{ with secret "secret/data/traefikee/prxjointoken" }}{{.Data.data.proxy_join_token}}{{end}}" TRAEFIKEE_PLUGIN_TOKEN="{{ with secret "secret/data/traefikee/plugin" }}{{.Data.data.token}}{{end}}" CTRL_ADDR={{ range service "traefikee-controllers" }}{{ .Address }},{{ end }} EOH destination = "secrets/traefikee.env" env = true change_mode = "restart" } config { image = "traefik/traefikee:v2.11.7" args = [ "plugin-registry", "--name=traefik-registry", "--addr=:8443", "--discovery.static.peers=${CTRL_ADDR}:4242", "--plugindir=/var/lib/plugins" ] ports = ["https"] } volume_mount { volume = "plugins" destination = "/var/lib/plugins" } resource { cpu = 500 ## MHz memory = 256 ## MB } } } }
Verify Traefik Enterprise deployed successfully.
nomad job status -namespace traefikee traefikee-controllers nomad job status -namespace traefikee traefikee-proxies ID = traefikee Name = traefikee Submit Date = 2024-04-16T19:59:31Z Type = service Priority = 50 Datacenters = dc1 Namespace = default Node Pool = default Status = running Periodic = false Parameterized = false Summary Task Group Queued Starting Running Failed Complete Lost Unknown controllers 0 0 1 0 0 0 0 proxies 0 0 1 1 0 0 0 traefikee-registry 0 0 1 1 0 0 0 Latest Deployment ID = 1ed47168 Status = successful Description = Deployment completed successfully Deployed Task Group Desired Placed Healthy Unhealthy Progress Deadline controllers 1 1 1 0 2024-04-16T20:10:17Z proxies 1 1 1 0 2024-04-16T20:09:47Z traefikee-registry 1 1 1 0 2024-04-16T20:09:47Z Allocations ID Node ID Task Group Version Desired Status Created Modified dfd68508 482e6bb2 proxies 0 run running 2h56m ago 2h55m ago 238cbc03 482e6bb2 traefikee-registry 0 run running 2h56m ago 2h55m ago b6b1ab4b 482e6bb2 controllers 0 run running 2h57m ago 2h57m ago
Step 4: Install teectl tool
Task: Install teectl tool
Persona: Operation team
Description: teectl is a tool to manage and interact with Traefik Enterprise cluster deployments.
Download the teectl tool.
curl -OL https://s3.amazonaws.com/traefikee/binaries/v2.11.6/teectl_v2.11.6_linux_amd64.tar.gz
Extract and copy the teectl tool to
/usr/local/bin
.tar -xzf teectl_v2.11.6_linux_amd64.tar.gz teectl && sudo mv teectl /usr/local/bin/teectl && sudo chown root: /usr/local/bin/teectl && sudo chmod +x /usr/local/bin/teectl
Set the
NOMAD_CLIENT_IP
environment variable as a comma separated listed.export NOMAD_CLIENT_IP="IP1,IP2,IP3"
Generate teectl config from the Traefik Controller that was just deployed.
nomad alloc exec -namespace traefikee -i=false -t=false -task controllers $(nomad job status -namespace traefikee traefikee-controllers | grep controller | grep running | awk '{ print $1 }' | head -n 1) /traefikee generate c --swarm.hosts=${NOMAD_CLIENT_IP} --socket local/api.sock --cluster=nomad > nomad.yaml
Import the generated teectl config file.
teectl cluster import --file="nomad.yaml" --force
Use the newly imported cluster configuration as the default.
teectl cluster use --name nomad
Verify that the teectl command can interact with the Traefik Enterprise cluster.
teectl get nodes ID NAME STATUS ROLE 9jzp822r208axm1z7jfnzz6kx i-084836f9668f50a61 Ready Proxy / Ingress rt5aagqlkh68s50wnykqd8neg traefikee.controllers[0] Ready Controller (Leader) yoz2pj079nf1tjlpq17nw2hu8 traefik-registry Ready Plugin Registry
Step 5: Apply Traefik Enterprise static configuration
Task: Apply Traefik static-config
Persona: Operation team
Description: Traefik Enterprise has two types of configuration:
- Static configuration: This is the startup configuration where
entryPoints
andProviders
are defined.entryPoints
are network entry points Traefik needs to listen to, whileProviders
are infrastructure components Traefik needs to set up a connection with. These configurations do not change often. - Dynamic configuration: This configuration contains everything that defines how the requests are handled by the system. This configuration can change anytime and is seamlessly hot-reloaded with any request interruption or connection loss.
Update static.yaml with
VAULT_ADDRESS
,CONSUL_ADDRESS
and theROLE_ID
/SECRET_ID
values generated in step 1.Apply the static configuration using a teectl command.
teectl apply --file=static.yaml
To verify, you can use teectl to view the running static configuration.
teectl get static-config
static.yaml
entryPoints: web: address: ":80" transport: lifeCycle: requestAcceptGraceTimeout: 15 graceTimeOut: 10 websecure: address: ":443" transport: lifeCycle: requestAcceptGraceTimeout: 15 graceTimeOut: 10 api: dashboard: true apiportal: path: spec.yaml ## Traefik will monitor ConsulCatalog for any new service and publish it automatically if the service has the traefik specific tags. providers: consulCatalog: prefix: "traefik" exposedByDefault: false endpoint: address: "CONSUL_ADDRESS:8500" scheme: "https" certificatesResolvers: ## Enable Traefik integration with Vault's PKI engine as cert resolver vault-pki: vault: url: "https://VAULT_ADDRESS:8200" role: "traefikee" auth: appRole: roleID: "ROLE_ID" secretID: "SECRET_ID"
Step 6: Expose Traefik dashboard and API portal
Task: Apply Traefik dashboard and API portal
Persona: Operation team
Description: Traefik Enterprise comes with a dashboard that provides a detailed overview of the current status of your cluster, including detailed information on your cluster's configuration. Traefik Enterprise also includes an API Portal which groups all the API specifications from your services into a web UI.
Enable access to the internal resources.
dynamic.yaml
http: routers: dashboard: entryPoints: - "websecure" rule: Host(`dashboard.nomad.demo.traefiklabs.tech`) service: api@internal tls: certResolver: vault-pki apiportal: entryPoints: - "websecure" rule: Host(`apiportal.nomad.demo.traefiklabs.tech`) service: apiportal@internal tls: certResolver: vault-pki
teetcl apply --file=dynamic.yaml
Step 7: Deploy the demo application
Task: Apply demo application
Persona: Application team
Description: With Traefik Enterprise fully configured, you can now start deploying applications.
Deploy the whoami test application, expose it through port 443, and secure it using a certificate generated by the Vault PKI provider.
whoami.hcl
job "whoami" { datacenters = ["dc1"] Namespace = "apps" group "whoami" { count = 1 network { port "web" {} } service { name = "whoami" port = "web" tags = [ "traefik.enable=true", "traefik.http.routers.whoami.rule=host(`whoami.nomad.demo.traefiklabs.tech`)", "traefik.http.routers.whoami.entrypoints=websecure", "traefik.http.routers.whoami.tls.certresolver=vault-pki", ] ## Using the tags, the service was exposed via Traefik. ## Traefik will match any request with this specific URL on port 443 (websecure). ## It will request a certificate for this service using the PKI-Vault provider defined in the static-config file. check { type = "http" path = "/health" port = "web" interval = "10s" timeout = "2s" } } task "whoami" { driver = "docker" config { image = "traefik/whoami" ports = ["web"] args = ["--port", "${NOMAD_PORT_web}"] } resources{ cpu = 100 memory = 128 } } } }
nomad namespace apply -description "Apps Namespace" apps
nomad run whoami.hcl
The dashboard should detect the new route.
Use Terraform to deploy the whoami application.
whoami.tf
resource "nomad_job" "whoami" { jobspec = file(".whoami.hcl") } provider "nomad" { address = "http://${NOMAD_SRVR_ADDR}:4646" }