Service discovery

Service discovery is the process of automatically detecting and locating services within your infrastructure. In the context of Nomad Enterprise, it allows applications and services to find and communicate with each other without hard-coding IP addresses and ports.

Service discovery is a more advanced form of DNS that goes beyond translating domain names into IP addresses. It provides additional capabilities such as real-time updates, health status information, metadata about services, load balancing, and support for dynamic environments.

Real-time updates as services come online or go offline
Health status information about services
Metadata about services (for example version, capabilities)
Load balancing across multiple instances of a service
Support for dynamic environments where service locations frequently change

These advanced features make service discovery an essential component in modern distributed systems, enabling flexible and resilient application architectures. In a shared service environment, where multiple teams deploy various applications and microservices, service discovery becomes crucial for several reasons.

Services may scale up or down, or move between hosts frequently. Service discovery ensures that clients can always locate the services they need, regardless of changes in the underlying infrastructure.
Facilitates load balancing and enhances system resilience by allowing clients to discover all healthy instances of a service and distribute requests among them.
Eliminates the need for manual reconfiguration or calling external systems when service locations change, such as an IPAM.

Note

While Nomad Enterprise provides basic service discovery capabilities, these cover services running within the Nomad cluster itself. Nomad does not offer native support for external service discovery, ingress routing, or a DNS interface.

Enable comprehensive service discovery and routing, including for external clients, by integrating Nomad Enterprise with additional tooling that specialize in service discovery such as Consul, which provides robust service discovery, a DNS interface, and a mature integration pattern with Nomad Enterprise.

Health checks

Health checking is a crucial component of service discovery and management in Nomad Enterprise. It ensures that only healthy, properly functioning service instances are available for discovery and available other applications. Service health checks represent tests that Nomad Enterprise performs to determine the status of a service.

Visit the Service Discovery on Nomad tutorial for a guide how to implement it into your job file. Use the check and service documentation pages as reference for a detailed list on the various options available.

Nomad supports the following capabilities for native service discovery.

Note

Specify provider = "nomad" as it defaults to Consul when using the service block.

HTTP health checks

This example configures an HTTP health check for a web application. It sends a GET request to the /healthz endpoint every 10 seconds and if receiving no response, is timed out after 5 seconds.

  service {
    name = "web-service"
    port = "http"
    provider = "nomad"

    check {
      type     = "http"
      path     = "/healthz"
      interval = "10s"
      timeout  = "5s"
      method = "GET"
    }
  }

If your application has multiple HTTP endpoints, you can use multiple HTTP service checks. Also, you can add in custom headers. This example shows two services.

HTTP GET request to the /healthz endpoint over port 8080 using a Authorization header.
HTTP GET request to a /v1/db-check to ensure the API service is able to communicate with the database.
If both status return a 200 code, then the health check passes. If one fails, they both fail.

group {

    network {
        port "api" {
            to = 8080
        }
    }

task "api" {

#...

  service {
    name = "api-service"
    port = "api"
    provider = "nomad"

    check {
      type     = "http"
      path     = "/healthz"
      interval = "10s"
      timeout  = "5s"
      method = "GET"
      header {
        Authorization = ["Basic ZWxhc3RpYzpjaGFuZ2VtZQ=="]
      }
    }
  }
  service {
    name = "db-ready"
    port = "http"
    provider = "nomad"

    check {
      type     = "http"
      path     = "/v1/db-check"
      interval = "10s"
      timeout  = "5s"
      method = "GET"
    }
  }

Use multiple checks to verify different components or functionalities of your service and adjust intervals based on the importance and expected response time of each check.

Note

Failure is inferred from any non-2xx response code. Tweak your application appropriately to respect this.

TCP health check

You can, instead, use a TCP health check for services where a straightforward connection test is sufficient. Adjust the interval based on the criticality of the service and acceptable detection time for failures. This example attempts to establish a TCP connection to the label endpoint port defined in group.network{} every 10 seconds. The check fails if the connection is not established within 5 seconds.

  service {
    name = "app"
    port = "endpoint"
    provider = "nomad"

    check {
      type     = "tcp"
      port     = "endpoint"
      interval = "10s"
      timeout  = "5s"
    }
  }

If the application uses multiple ports, use multiple service checks. Remember, if one check fails, they both fail.

group {

    network {
        port "endpoint1" {
            to = 8080
        }
        port "endpoint2" {
            to = 9000
        }
    }

task {

#...

  service {
    name = "app_endpoint1"
    port = "endpoint1"
    provider = "nomad"

    check {
      type     = "tcp"
      port     = "endpoint"
      interval = "10s"
      timeout  = "5s"
    }
  }
service {
    name = "app_endpoint2"
    port = "endpoint2"
    provider = "nomad"

    check {
      type     = "tcp"
      port     = "endpoint2"
      interval = "10s"
      timeout  = "5s"
    }
  }

#...
 }
}

Upgrades

Use health checks during upgrades to ensure the application is healthy before promoting it. When performing an upgrade, the health_check = "checks" directive within the update block means Nomad Enterprise respects the service checks of the tasks. If the health check fails, then a roll back occurs.

job {
  update {
      health_check = "checks"
      auto_revert = true
  }
  group "frontend" {
    count = 3

    network {
        port "http" {
            to = 8080
        }
    }
    task "web" {
        service {
            name = "web-service"
            port = "http"
            provider = "nomad"

            check {
              type     = "http"
              path     = "/healthz"
              interval = "10s"
              timeout  = "5s"
              method = "GET"
            }
        }
    }
  }
}

Leveraging service discovery in tasks

In order for services to communicate with each other without having to know their IP address or DNS host name you need to leverage the template feature. Templates along with service discovery, offers a way to dynamically configure tasks based on the current state of your infrastructure. These templates are initially rendered when a task starts and update either periodically or in response to specific events, ensuring your tasks always have the most up-to-date service information.

For service discovery, the data that is dynamically retrieved are the IP address and port of another task. Here is an example diagram to visualize the templating flow:

Template rendering flow

Below are two different example tasks that need to communicate with each other.

App1 has a dependency on App2 and is using a hard-coded IP as an environment variable. This does not scale, breaks when Nomad Enterprise schedules App2 on a different host and pins traffic to one instance if there are multiple instances. This poses additional problems if you are using dynamic ports with App2 which can also change whenever restarted or rescheduled.

job "app1" {
  group "app1group" {
    network {
        port "app1" {}
    }
    task "server" {
      env {
          APP2_IP = "172.16.30.5"
      }
      service {
            name = "app1"
            port = "app1"
            provider = "nomad"
      }
    }
  }
}

job "app2" {
  group "app2group" {
    count = 3
    network {
        port "app2" {}
    }
    task "server" {
      service {
            name = "app2"
            port = "app2"
            provider = "nomad"
      }
    }
  }
}

To fix this, we can put the environment variable as a template to ensure we always have the correct IP and port of App2.

job "app1" {
  group "app1group" {
    network {
        port "app1" {}
    }
    task "server" {
      template {
          data = >>EOH
{{ range nomadService "app2" }}
        APP2_IP = {{ .Address }}:{{ .Port }}
{{ end }}
        EOH
        destination = "local/vars.env"
        env = true
      }
      service {
            name = "app1"
            port = "app1"
            provider = "nomad"
      }
    }
  }
}

By default, if there is a change to the App2 IP address port, Nomad Enterprise by default automatically restarts the App1 task. Use the change_mode to determine what action occurs when Nomad Enterprise detects a change with the rendered template.

Templating is a powerful tool which allows a lot of creativity and flexibility for your workloads. Visit template documentation page for more information.

Application scaling

Secrets and Variables