Register and monitor external services with Consul ESM

17min
|
Consul
Interactive

Workflow

Consul's service discovery features are a versatile solution for monitoring application health, tracking services and nodes, and keeping a current catalog of healthy service instances. When you register your services to Consul or monitor your application health with distributed checks, a Consul client agent runs on the same node as your service. This client agent performs service and node health checks for the Consul servers and provides a local interface with the Consul DNS service.

Consul binary releases support a wide range of platforms and architectures, allowing many kinds of services and nodes to join a datacenter. However, there are situations where it may not be possible to install a Consul agent on the node that hosts a service. In these situations, the node must run outside of the Consul datacenter. Examples of these external services include:

third-party SaaS services, such as Amazon RDS or Azure Database for PostgreSQL
legacy services that do not permit installing third-party software on the node
services managed by other teams, where you do not have access to the underlying node

In this tutorial, you will learn how to register an external service to your Consul datacenter and make it discoverable using the Consul DNS interface. You will accomplish these tasks using Consul External Service Monitor (ESM), a tool that helps run health checks and update the status of those health checks in the catalog.

Tutorial scenario

This tutorial uses HashiCups, a demo coffee shop application made up of several microservices running on VMs.

Your Consul datacenter will interact with four HashiCups services:

NGINX runs on a node named hashicups-nginx-0
Frontend runs on a node named hashicups-frontend-0
API runs on a node named hashicups-api-0
Database runs on two nodes named hashicups-db-0 and hashicups-db-1

At the start of the tutorial, the NGINX, Frontend, and API services are already registered in Consul catalog and are discoverable by their downstream services with Consul DNS. The two instances of the database service are not registered in the Consul catalog because they run on external nodes that cannot join the datacenter.

As a result, the API service is configured to communicate with one of the database instances using its IP address.

The node consul-esm-0 is also registered with the Consul catalog, but starts with no services running.

Architecture diagram. Initial state. Two instances of database service run on nodes that are external to the Consul datacenter. The API service in the datacenter communicates with one node using its IP address.

The initial scenario in this tutorial includes a Consul datacenter with HashiCups deployed. There are two instances of the Database service running on nodes that are not registered with Consul.

Because HashiCups requires the API service to communicate with the database service, the API service uses the IP address of one of the database nodes. Without the Consul agent, it is not possible to load balance between two different nodes.

Prerequisites

This tutorial assumes you are already familiar with Consul and its core functionalities. If you are new to Consul refer to refer to the Consul Getting Started tutorials collection.

If you want to follow along with this tutorial and you do not already have the required infrastructure in place, the following steps guide you through the process to deploy a demo application and a configured Consul service mesh on AWS automatically using Terraform.

To create a Consul deployment on AWS using Terraform, you need the following:

An AWS account configured for use with Terraform
aws-cli >= v2.0
terraform >= v1.0
consul >= v1.18.0

Clone GitHub repository

Clone the GitHub repository containing the configuration files and resources.

$ git clone https://github.com/hashicorp-education/learn-consul-external-services-vms

Enter the directory that contains the configuration files for this tutorial.

$ cd learn-consul-external-services-vm/self-managed/infrastructure/aws

Create infrastructure

With these Terraform configuration files, you are ready to deploy your infrastructure.

Issue the terraform init command from your working directory to download the necessary providers and initialize the backend.

$ terraform init
Initializing the backend...
Initializing provider plugins...
...
Terraform has been successfully initialized!
...

Then, deploy the resources. Confirm the run by entering yes.

$ terraform apply -var-file=../../ops/conf/monitor_external_services_with_consul_esm.tfvars
## ...
Do you want to perform these actions?
 Terraform will perform the actions described above.
 Only 'yes' will be accepted to approve.
 Enter a value: yes
## ...
Apply complete! Resources: 50 added, 0 changed, 0 destroyed.

Tip

The Terraform deployment could take up to 15 minutes to complete. Feel free to explore the next sections of this tutorial while waiting for the environment to complete initialization.

After the deployment is complete, Terraform returns a list of outputs you can use to interact with the newly created environment.

Outputs:

connection_string = "ssh -i certs/id_rsa.pem admin@`terraform output -raw ip_bastion`"
ip_bastion = "<redacted-output>"
remote_ops = "export BASTION_HOST=<redacted-output>"
retry_join = "provider=aws tag_key=ConsulJoinTag tag_value=auto-join-hcoc"
ui_consul = "https://<redacted-output>:8443"
ui_grafana = "http://<redacted-output>:3000/d/hashicups/hashicups"
ui_hashicups = "http://<redacted-output>"

The Terraform outputs provide useful information, including the bastion host IP address. The following is a brief description of the Terraform outputs:

The ip_bastion provides IP address of the bastion host you use to run the rest of the commands in this tutorial.
The remote_ops lists the bastion host IP, which you can use access the bastion host.
The retry_join output lists Consul's retry_join configuration parameter.
The ui_consul output lists the Consul UI address. The Consul UI is not currently running. You will use the Consul UI in this tutorial to verify services registered in your catalog.
The ui_grafana output lists the Grafana UI address. You will not use this address in this tutorial.
The ui_hashicups output lists the HashiCups UI address. You can open this address in a web browser to verify the HashiCups demo application is running properly.

Check initial state in Consul UI

Retrieve the Consul UI address from Terraform.

$ terraform output -raw ui_consul

Open the address in a browser.

After accepting the certificate presented by the Consul server, you will land on the Services page.

Services page - HashiCups with no DB instances

Notice that the Services page only shows the Consul service and three services for the HashiCups application. The Database service is not registered in the Consul catalog.

Click on the Nodes tab.

Nodes page - HashiCups with no DB instances and a node for Consul ESM

The Database nodes do not appear. The consul-esm-0 node appears because a properly configured Consul client agent is already running on the node, even though there is no service running.

$ ssh -i certs/id_rsa.pem admin@`terraform output -raw ip_bastion`
#...
admin@bastion:~$

Configure CLI to interact with Consul

Configure your bastion host to communicate with your Consul environment using the two dynamically generated environment variable files.

$ source "/home/admin/assets/scenario/env-scenario.env" && \
  source "/home/admin/assets/scenario/env-consul.env"

After loading the needed variables, verify you can connect to your Consul datacenter.

$ consul members
Node                  Address          Status  Type    Build   Protocol  DC   Partition  Segment
consul-server-0       172.21.0.7:8301  alive   server  1.18.1  2         dc1  default    <all>
consul-esm-0          172.21.0.6:8301  alive   client  1.18.1  2         dc1  default    <default>
hashicups-api-0       172.21.0.2:8301  alive   client  1.18.1  2         dc1  default    <default>
hashicups-frontend-0  172.21.0.4:8301  alive   client  1.18.1  2         dc1  default    <default>
hashicups-nginx-0     172.21.0.9:8301  alive   client  1.18.1  2         dc1  default    <default>

Verify downstream service configuration

The hashicups-db service instances are not part of Consul catalog. This means the upstream services that need to connect to them must use their IP address. Verify the configuration for the hashicups-api service instance.

$ ssh -i certs/id_rsa hashicups-api-0
#..
admin@hashicups-api-0:~

Verify hashicups-api application configuration.

$ cat ~/conf.json
{
  "db_connection": "host=172.21.0.8 port=5432 user=hashicups password=hashicups_pwd dbname=products sslmode=disable",
  "bind_address": ":9090",
  "metrics_address": ":9103"
}

The host value for the db_connection is set to the IP address of one of the hashicups-db instances. As a result, the HashiCups application is unable to scale by using multiple instances of the Database service, and it is prone to errors if the IP address of the instance changes.

To continue with the tutorial, exit the SSH session to return to the bastion host.

$ exit
logout
Connection to hashicups-api-0 closed.
admin@bastion:~$

Register external services

In the context of Consul, external services run on nodes where you cannot run a local Consul agent. These nodes might be inside your infrastructure but not directly configurable, such as with a mainframe, virtual appliance, or an unsupported platform. These nodes can also be outside of your infrastructure, such as with a SaaS platform. When registering an external service, it is also necessary to register the node it runs on.

For this reason, the configuration for an external service registered directly with the catalog is slightly different than the one for an internal service registered by an agent. Because this scenario contains two hashicups-db nodes, you will create two configuration files for the two different nodes and services.

Create the service configuration files

Create the folder that will contain the external service definition.

$ mkdir -p ~/assets/scenario/conf/external-services

Retrieve the IP address for the hashicups-db-0 node.

$ export IP_HASHICUPS_DB_0=`getent hosts hashicups-db-0 | awk '{print $1}'`

Then, create the configuration for hashicups-db-0 that includes the networking information, service definition, and health check to run.

$ tee ~/assets/scenario/conf/external-services/hashicups-db-0.json > /dev/null << EOF
{
  "Datacenter": "$CONSUL_DATACENTER",
  "Node": "hashicups-db-0-ext",
  "ID": "`cat /proc/sys/kernel/random/uuid`",
  "Address": "${IP_HASHICUPS_DB_0}",
  "NodeMeta": {
    "external-node": "true",
    "external-probe": "true"
  },
  "Service": {
    "ID": "hashicups-db-0",
    "Service": "hashicups-db",
    "Tags": [
      "external",
      "inst_0"
    ],
    "Address": "${IP_HASHICUPS_DB_0}",
    "Port": 5432
  },
  "Checks": [{
    "CheckID": "hashicups-db-0-check",
    "Name": "hashicups-db check",
    "Status": "passing",
    "ServiceID": "hashicups-db-0",
    "Definition": {
      "TCP": "${IP_HASHICUPS_DB_0}:5432",
      "Interval": "5s",
      "Timeout": "1s",
      "DeregisterCriticalServiceAfter": "60m"
     }
  }]
}
EOF

Complete the same process for hashicups-db-1. Retrieve the IP address of the node.

$ export IP_HASHICUPS_DB_1=`getent hosts hashicups-db-1 | awk '{print $1}'`

Finally, create the configuration for hashicups-db-1 node with relative service and health checks.

$ tee ~/assets/scenario/conf/external-services/hashicups-db-1.json > /dev/null << EOF
{
  "Datacenter": "$CONSUL_DATACENTER",
  "Node": "hashicups-db-1-ext",
  "ID": "`cat /proc/sys/kernel/random/uuid`",
  "Address": "${IP_HASHICUPS_DB_1}",
  "NodeMeta": {
    "external-node": "true",
    "external-probe": "true"
  },
  "Service": {
    "ID": "hashicups-db-1",
    "Service": "hashicups-db",
    "Tags": [
      "external",
      "inst_1"
    ],
    "Address": "${IP_HASHICUPS_DB_1}",
    "Port": 5432
  },
  "Checks": [{
    "CheckID": "hashicups-db-1-check",
    "Name": "hashicups-db check",
    "Status": "passing",
    "ServiceID": "hashicups-db-1",
    "Definition": {
      "TCP": "${IP_HASHICUPS_DB_1}:5432",
      "Interval": "5s",
      "Timeout": "1s",
      "DeregisterCriticalServiceAfter": "60m"
     }
  }]
}
EOF

Observe the following details in this configuration:

Because you are defining a node that does not have a local Consul agent, you must include the Datacenter value so that Consul knows where to register the node and service.
The node is identified using the Node, ID, and Address parameters.
The NodeMeta parameter is necessary for consul-esm to identify the external nodes that will require periodic health checks.
- For Consul ESM to detect external nodes and health checks, set "external-node": "true" in the node metadata before you register it.
- The external-probe field determines if Consul ESM performs regular pings to the node to update its health status.
The Status field inside the Checks section is set to passing. When registering a service, Consul sets the initial status to critical and then updates the status after it performs its first health check. In the case of external services, since no health check is performed until consul-esm is running, the service will remain in the critical state until you start consul-esm process. For the scope of this tutorial, you will register the service with an initial state of passing so that you can avoid downtime in the application. However, we recommend that you register services with a critical health status in production environments to ensure that service instances are healthy before Consul can route traffic to them.

For a full list of service configuration parameters refer to Services configuration reference and Health check configuration reference.

Register the external services in the Consul catalog

$ curl --silent \
    --header "X-Consul-Token: $CONSUL_HTTP_TOKEN" \
    --connect-to server.${CONSUL_DATACENTER}.${CONSUL_DOMAIN}:8443:consul-server-0:8443 \
    --cacert ${CONSUL_CACERT} \
    --data @/home/admin/assets/scenario/conf/external-services/hashicups-db-0.json \
    --request PUT \
    https://server.${CONSUL_DATACENTER}.${CONSUL_DOMAIN}:8443/v1/catalog/register

If the registration is performed correctly the command will output true.

Then, register hashicups-db-1.

$ curl --silent \
    --header "X-Consul-Token: $CONSUL_HTTP_TOKEN" \
    --connect-to server.${CONSUL_DATACENTER}.${CONSUL_DOMAIN}:8443:consul-server-0:8443 \
    --cacert ${CONSUL_CACERT} \
    --data @/home/admin/assets/scenario/conf/external-services/hashicups-db-1.json \
    --request PUT \
    https://server.${CONSUL_DATACENTER}.${CONSUL_DOMAIN}:8443/v1/catalog/register

If the registration is performed correctly the command will output true.

Verify service and node registration

After you register the services and nodes, you can verify they are present in the Consul catalog.

Get the list of nodes for your Consul datacenter.

$ consul catalog nodes -detailed
Node                  ID                                    Address     DC   TaggedAddresses                                                           Meta
consul-esm-0          3993c409-ab35-80a9-f4f7-8cb3f3e9085b  172.21.0.4  dc1  lan=172.21.0.4, lan_ipv4=172.21.0.4, wan=172.21.0.4, wan_ipv4=172.21.0.4  consul-network-segment=, consul-version=1.18.1
consul-server-0       885bdc1d-b58f-9cc4-1df1-1c68a0605e41  172.21.0.5  dc1  lan=172.21.0.5, lan_ipv4=172.21.0.5, wan=172.21.0.5, wan_ipv4=172.21.0.5  consul-network-segment=, consul-version=1.18.1
hashicups-api-0       3d15d86e-4e82-0c1b-0210-a0302f439565  172.21.0.8  dc1  lan=172.21.0.8, lan_ipv4=172.21.0.8, wan=172.21.0.8, wan_ipv4=172.21.0.8  consul-network-segment=, consul-version=1.18.1
hashicups-db-0-ext    f8d9e662-37da-4810-a374-6a77e8024c4c  172.21.0.6  dc1                                                                            external-node=true, external-probe=true
hashicups-db-1-ext    7800e840-d65c-4f4c-b29a-29e7331306df  172.21.0.9  dc1                                                                            external-node=true, external-probe=true
hashicups-frontend-0  a6ccbdad-39c2-e132-aef6-9be1271822ad  172.21.0.2  dc1  lan=172.21.0.2, lan_ipv4=172.21.0.2, wan=172.21.0.2, wan_ipv4=172.21.0.2  consul-network-segment=, consul-version=1.18.1
hashicups-nginx-0     a9789d6b-ea74-7f18-67d0-50ee26838ed7  172.21.0.7  dc1  lan=172.21.0.7, lan_ipv4=172.21.0.7, wan=172.21.0.7, wan_ipv4=172.21.0.7  consul-network-segment=, consul-version=1.18.1

The two new nodes hashicups-db-0-ext and hashicups-db-1-ext are now present and in the Meta section you can verify the external-node and external-probe metadata.

Get the list of services for your Consul datacenter.

$ consul catalog services -tags
consul
hashicups-api           inst_0
hashicups-db            external,inst_0,inst_1
hashicups-frontend      inst_0
hashicups-nginx         inst_0

Use the /v1/catalog/nodes endpoint to query the Consul catalog and show all node instances.

$ curl --silent \
   --header "X-Consul-Token: $CONSUL_HTTP_TOKEN" \
   --connect-to server.${CONSUL_DATACENTER}.${CONSUL_DOMAIN}:8443:consul-server-0:8443 \
   --cacert ${CONSUL_CACERT} \
   https://server.${CONSUL_DATACENTER}.${CONSUL_DOMAIN}:8443/v1/catalog/nodes | jq -r .

Click here to examine an example output for the API endpoint.

[
  ## ...
  {
    "ID": "f8d9e662-37da-4810-a374-6a77e8024c4c",
    "Node": "hashicups-db-0-ext",
    "Address": "172.21.0.6",
    "Datacenter": "dc1",
    "TaggedAddresses": null,
    "Meta": {
      "external-node": "true",
      "external-probe": "true"
    },
    "CreateIndex": 314,
    "ModifyIndex": 314
  },
  {
    "ID": "7800e840-d65c-4f4c-b29a-29e7331306df",
    "Node": "hashicups-db-1-ext",
    "Address": "172.21.0.9",
    "Datacenter": "dc1",
    "TaggedAddresses": null,
    "Meta": {
      "external-node": "true",
      "external-probe": "true"
    },
    "CreateIndex": 317,
    "ModifyIndex": 317
  },
  ## ...
]

Use the /v1/catalog/services endpoint to query the Consul catalog and show all service instances.

$ curl --silent \
   --header "X-Consul-Token: $CONSUL_HTTP_TOKEN" \
   --connect-to server.${CONSUL_DATACENTER}.${CONSUL_DOMAIN}:8443:consul-server-0:8443 \
   --cacert ${CONSUL_CACERT} \
   https://server.${CONSUL_DATACENTER}.${CONSUL_DOMAIN}:8443/v1/catalog/services | jq -r .

Click here to examine an example output for the API endpoint.

{
  "consul": [],
  "hashicups-api": [
    "inst_0"
  ],
  "hashicups-db": [
    "inst_0",
    "inst_1",
    "external"
  ],
  "hashicups-frontend": [
    "inst_0"
  ],
  "hashicups-nginx": [
    "inst_0"
  ]
}

Retrieve the Consul UI address from Terraform.

$ terraform output -raw ui_consul

Open the address in a browser.

The default page, Services, shows all registered services. Notice the hashicups-db service has two instances.

Services page - Multiple DB instances

Click on the hashicups-db service. In the new page, click on the Instances tab to view details for the different service instances.

Service DB page - Multiple instances

In Services, the instances display a No node checks label, indicating that Consul does not perform periodic checks to verify the node health. The same icon appears on the Nodes page.

Nodes page - External DB instances

Verify domain name resolution and load balancing

Once the service is present in the Consul catalog, Consul DNS interface will be able to resolve the service correctly.

$ ssh -i certs/id_rsa hashicups-api-0
#..
admin@hashicups-api-0:~

Verify that you can resolve all instances of hashicups-db services using Consul DNS interface.

$ dig hashicups-db.service.dc1.consul

; <<>> DiG 9.18.24-1-Debian <<>> hashicups-db.service.dc1.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44982
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;hashicups-db.service.dc1.consul. IN    A

;; ANSWER SECTION:
hashicups-db.service.dc1.consul. 0 IN   A   172.21.0.5
hashicups-db.service.dc1.consul. 0 IN   A   172.21.0.8

;; Query time: 0 msec
;; SERVER: 172.21.0.7#53(172.21.0.7) (UDP)
;; WHEN: Wed May 22 14:11:42 UTC 2024
;; MSG SIZE  rcvd: 92

Notice the dig command returns two IPs. Consul will load balance requests across all available instances of the service.

Verify that you can resolve hashicups-db-0-ext node using Consul.

$ dig hashicups-db-0-ext.node.dc1.consul

; <<>> DiG 9.16.48-Debian <<>> hashicups-db-0-ext.node.dc1.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60590
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;hashicups-db-0-ext.node.dc1.consul. IN A

;; ANSWER SECTION:
hashicups-db-0-ext.node.dc1.consul. 0 IN A  10.0.4.159

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Thu Jun 06 14:35:58 UTC 2024
;; MSG SIZE  rcvd: 79

Verify you can connect to the first instance of hashicups-db service using Consul FQDN.

$ PGPASSWORD=hashicups_pwd \
    psql -P pager=off \
        -d products \
        -U hashicups \
        -h inst_0.hashicups-db.service.dc1.consul \
        -c "select * from coffees;"

Your output should appear similar to the following example:

 id |        name         |                            teaser                            | collection  |   origin    |  color  | description | price |     image      |     created_at      |     updated_at      | deleted_at 
----+---------------------+--------------------------------------------------------------+-------------+-------------+---------+-------------+-------+----------------+---------------------+---------------------+------------
  1 | HCP Aeropress       | Automation in a cup                                          | Foundations | Summer 2020 | #444    |             |   200 | /hashicorp.png | 2024-05-22 00:00:00 | 2024-05-22 00:00:00 | 
  2 | Packer Spiced Latte | Packed with goodness to spice up your images                 | Origins     | Summer 2013 | #1FA7EE |             |   350 | /packer.png    | 2024-05-22 00:00:00 | 2024-05-22 00:00:00 | 
  3 | Vaulatte            | Nothing gives you a safe and secure feeling like a Vaulatte  | Foundations | Spring 2015 | #FFD814 |             |   200 | /vault.png     | 2024-05-22 00:00:00 | 2024-05-22 00:00:00 | 
  4 | Nomadicano          | Drink one today and you will want to schedule another        | Foundations | Fall 2015   | #00CA8E |             |   150 | /nomad.png     | 2024-05-22 00:00:00 | 2024-05-22 00:00:00 | 
  5 | Terraspresso        | Nothing kickstarts your day like a provision of Terraspresso | Origins     | Summer 2014 | #894BD1 |             |   150 | /terraform.png | 2024-05-22 00:00:00 | 2024-05-22 00:00:00 | 
  6 | Vagrante espresso   | Stdin is not a tty                                           | Origins     | 2010        | #0E67ED |             |   200 | /vagrant.png   | 2024-05-22 00:00:00 | 2024-05-22 00:00:00 | 
  7 | Connectaccino       | Discover the wonders of our meshy service                    | Origins     | Spring 2014 | #F44D8A |             |   250 | /consul.png    | 2024-05-22 00:00:00 | 2024-05-22 00:00:00 | 
  8 | Boundary Red Eye    | Perk up and watch out for your access management             | Discoveries | Fall 2020   | #F24C53 |             |   200 | /boundary.png  | 2024-05-22 00:00:00 | 2024-05-22 00:00:00 | 
  9 | Waypointiato        | Deploy with a little foam                                    | Discoveries | Fall 2020   | #14C6CB |             |   250 | /waypoint.png  | 2024-05-22 00:00:00 | 2024-05-22 00:00:00 | 
(9 rows)

To continue with the tutorial, exit the SSH session to return to the bastion host.

$ exit
logout
Connection to hashicups-api-0 closed.
admin@bastion:~$

You will now stop one instance of the hashicups-db service and verify Consul behavior.

$ ssh -i certs/id_rsa hashicups-db-0
#..
admin@hashicups-db-0:~

Stop the first instance of hashicups-db service instance.

$ ./start_service.sh stop
Stop pre-existing instances.
Service instance stopped.

To continue with the tutorial, exit the SSH session to return to the bastion host.

$ exit
logout
Connection to hashicups-db-0 closed.
admin@bastion:~$

$ ssh -i certs/id_rsa hashicups-api-0
#..
admin@hashicups-api-0:~

Verify available instances of hashicups-db services using Consul.

Notice the dig command still returns two IPs. This is because health checks are not performed periodically.

$ dig hashicups-db.service.dc1.consul

; <<>> DiG 9.18.24-1-Debian <<>> hashicups-db.service.dc1.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42128
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;hashicups-db.service.dc1.consul. IN    A

;; ANSWER SECTION:
hashicups-db.service.dc1.consul. 0 IN   A   172.21.0.5
hashicups-db.service.dc1.consul. 0 IN   A   172.21.0.8

;; Query time: 0 msec
;; SERVER: 172.21.0.7#53(172.21.0.7) (UDP)
;; WHEN: Wed May 22 14:11:42 UTC 2024
;; MSG SIZE  rcvd: 92

Verify your connection to the hashicups-db-0-ext service instance.

$ PGPASSWORD=hashicups_pwd \
    psql -P pager=off \
        -d products \
        -U hashicups \
        -h inst_0.hashicups-db.service.dc1.consul \
        -c "select * from coffees;"

This time the command fails. This failure occurs because Consul has not performed a health check on the service instance yet, so it is still marked as healthy in the catalog.

psql: error: connection to server at "inst_0.hashicups-db.service.dc1.consul" (172.21.0.6), port 5432 failed: Connection refused
    Is the server running on that host and accepting TCP/IP connections?

To continue with the tutorial, exit the SSH session to return to the bastion host.

$ exit
logout
Connection to hashicups-api-0 closed.
admin@bastion:~$

This completes the first part of the tutorial, where you learned how to register external services with Consul. You also learned how to verify that external services are resolved using the Consul DNS interface. Finally, you learned that you cannot rely on the Consul catalog to return healthy instances of external services.

In the next section of the tutorial you will introduce consul-esm in your datacenter and use it to periodically check the health of your external services.

Create ACL token for Consul ESM

Consul ESM requires access to several Consul resources to operate properly:

agent:read - to check version compatibility and calculate network coordinates.
key:write - to store assigned health checks, the path is defined in the configuration file.
node:write - to update the status of each node that consul-esm monitors.
node:read - to retrieve list of nodes that need to be monitored.
service:write - to register consul-esm service defined in the configuration file.
session:write - to acquire consul-esm cluster leader lock.

When configuring the ACL policy for consul-esm, you can decide whether to assign fine-tuned permissions to the token to grant permissions to the specific services and nodes you want to monitor, or to use a more permissive ACL policy that permits consul-esm to monitor newly added services without the need to modify specific ACL permissions.

First, create the rules for the ACL policy.

$ tee ~/assets/scenario/conf/acl-policy-consul-esm-strict.hcl > /dev/null << EOF

# To check version compatibility and calculating network coordinates
# Requires at least read for the agent API for the Consul node 
# where consul-esm is registered
agent "consul-esm-0" {
  policy = "read"
}

# To store assigned checks
key_prefix "consul-esm/" {
  policy = "write"
}

# To update the status of each node monitored by consul-esm
# Requires one acl block per node
node_prefix "hashicups-db" {
  policy = "write"
}

# To retrieve nodes that need to be monitored
node_prefix "" {
  policy = "read"
}

# To register consul-esm service
service_prefix "consul-esm" {
  policy = "write"
}

# To update health status for external service hashicups-db
service "hashicups-db" {
  policy = "write"
}

# To acquire consul-esm cluster leader lock when used in HA mode
session "consul-esm-0" {
   policy = "write"
}
EOF

Then, create the ACL policy using the rules defined in the previous step.

$ consul acl policy create \
    -name 'acl-policy-consul-esm-strict' \
    -description 'Policy for consul-esm' \
    -rules @/home/admin/assets/scenario/conf/acl-policy-consul-esm-strict.hcl  > /dev/null 2>&1

Finally, create the ACL token for the consul-esm instance.

$ consul acl token create \
    -description 'consul-esm token' \
    -policy-name acl-policy-consul-esm-strict \
    --format json > /home/admin/assets/scenario/conf/secrets/acl-token-consul-esm.json 2> /dev/null

First, create the rules for the ACL policy.

$ tee ~/assets/scenario/conf/acl-policy-consul-esm-permissive.hcl > /dev/null << EOF
# To check version compatibility and calculating network coordinates
agent_prefix "" {
  policy = "read"
}

# To store assigned checks
key_prefix "consul-esm/" {
  policy = "write"
}

# To update the status of each node that consul-esm monitors
node_prefix "" {
  policy = "write"
}

# To register consul-esm service
service_prefix "" {
  policy = "write"
}

# To acquire consul-esm cluster leader lock when used in HA mode
session_prefix "" {
   policy = "write"
}
EOF

Then, create the ACL policy using the rules defined in the previous step.

$ consul acl policy create \
    -name 'acl-policy-consul-esm-permissive' \
    -description 'Policy for consul-esm' \
    -rules @/home/admin/assets/scenario/conf/acl-policy-consul-esm-permissive.hcl  > /dev/null 2>&1

Finally, create the ACL token for the consul-esm instance.

$ consul acl token create \
    -description 'consul-esm token' \
    -policy-name acl-policy-consul-esm-permissive \
    --format json > /home/admin/assets/scenario/conf/secrets/acl-token-consul-esm.json 2> /dev/null

The command produces no output.

After you create the token, export it in an environment variable.

$ export CONSUL_ESM_TOK=`cat /home/admin/assets/scenario/conf/secrets/acl-token-consul-esm.json  | jq -r ".SecretID"`

Configure Consul ESM

After you create the ACL token for the consul-esm agent, create the configuration file.

$ tee ~/assets/scenario/conf/consul-esm-0/consul-esm-config.hcl > /dev/null << EOF
log_level = "DEBUG"

log_json = false

instance_id = "`cat /proc/sys/kernel/random/uuid`"

consul_service = "consul-esm"

consul_kv_path = "consul-esm/"

external_node_meta {
    "external-node" = "true"
}

node_reconnect_timeout = "72h"

node_probe_interval = "10s"

disable_coordinate_updates = false

http_addr = "localhost:8500"

token = "${CONSUL_ESM_TOK}"

datacenter = "${CONSUL_DATACENTER}"

client_address = "127.0.0.1:8080"

ping_type = "udp"

passing_threshold = 0

critical_threshold = 0
EOF

For a full configuration reference refer to Consul ESM configuration.

Copy the configuration file on the consul-esm-0 node.

$ rsync -av \
    -e "ssh -i /home/admin/certs/id_rsa" \
    /home/admin/assets/scenario/conf/consul-esm-0/consul-esm-config.hcl \
    admin@consul-esm-0:/home/admin/consul-esm-config.hcl

That output should be similar to the following example.

sending incremental file list
consul-esm-config.hcl

sent 626 bytes  received 35 bytes  1,322.00 bytes/sec
total size is 504  speedup is 0.76

Start Consul ESM

To start Consul ESM, log in to consul-esm-0 from the bastion host.

$ ssh -i certs/id_rsa consul-esm-0
#..
admin@consul-esm-0:~

Verify that Consul ESM the configuration was copied to the node correctly.

$ cat /home/admin/consul-esm-config.hcl

Click here to examine an example configuration file for Consul ESM.

log_level = "DEBUG"

log_json = false

instance_id = "e6564d2e-2bda-442a-b4e6-5f59dd5e8185"

consul_service = "consul-esm"

consul_kv_path = "consul-esm/"

external_node_meta {
    "external-node" = "true"
}

node_reconnect_timeout = "72h"

node_probe_interval = "10s"

disable_coordinate_updates = false

http_addr = "localhost:8500"

token = "14392d0b-e418-e713-fa40-0dc96e961b7d"

datacenter = "dc1"

client_address = "127.0.0.1:8080"

ping_type = "udp"

passing_threshold = 0

critical_threshold = 0

After you confirm that the configuration is correct, start consul-esm to run as a long-lived process.

$ consul-esm -config-file=consul-esm-config.hcl > /tmp/consul-esm.log  2>&1 &

The process starts in the background. You can check the logs for the process in the log file specified in the configuration.

$ cat /tmp/consul-esm*.log

Click here to examine an example log file for Consul ESM.

[INFO]  consul-esm: Connecting to Consul: address=localhost:8500
[WARN]  consul-esm: unable to determine Consul server version, check for compatibility; requires >= 1.4.1
[DEBUG] consul-esm: Consul agent and all servers are running compatible versions with ESM
Consul ESM running!
            Datacenter: "dc1"
               Service: "consul-esm"
           Service Tag: ""
            Service ID: "consul-esm:a8c7d327-3cd1-4123-85c9-f25dfe23635a"
Node Reconnect Timeout: "72h0m0s"
   Disable coordinates: false
        Statsd address: ""
         Metrix prefix: ""

Log data will now stream in as it occurs:

[DEBUG] consul-esm: Registered ESM service with Consul
[INFO]  consul-esm: Trying to obtain leadership...
[INFO]  consul-esm: Obtained leadership
[INFO]  consul-esm: Updating external node list: items=2
[INFO]  consul-esm: Rebalanced external nodes across ESM instances: nodes=2 instances=1
[INFO]  consul-esm: Fetched nodes from catalog: count=2
[DEBUG] consul-esm: Now waiting between node pings: time=5s
[INFO]  consul-esm: found check: name="hashicups-db check"
[INFO]  consul-esm: found check: name="hashicups-db check"
[DEBUG] consul-esm: Added TCP check: checkHash=hashicups-db-0-ext/hashicups-db-0/hashicups-db-0-check
[DEBUG] consul-esm: Added TCP check: checkHash=hashicups-db-1-ext/hashicups-db-1/hashicups-db-1-check
[INFO]  consul-esm: Updated checks: count=2 found=2 added=2 updated=0 removed=0
[INFO]  consul-esm: Health check counts changed: healthChecks=2 nodes=2
[DEBUG] consul-esm: Check status updated: check=hashicups-db-1-ext/hashicups-db-1/hashicups-db-1-check status=passing
[WARN]  consul-esm: Check socket connection failed: check=hashicups-db-0-ext/hashicups-db-0/hashicups-db-0-check error="dial tcp 172.21.0.6:5432: connect: connection refused"
[WARN]  consul-esm: Check is now critical: check=hashicups-db-0-ext/hashicups-db-0/hashicups-db-0-check
##...

The output verifies that Consul ESM is able to retrieve the two nodes that are going to be monitored, alongside their respective checks. Also, notice that the check for hashicups-db-0 is failing, which is expected because you stopped the service earlier in this tutorial.

To continue with the tutorial, exit the SSH session to return to the bastion host.

$ exit
logout
Connection to consul-esm-0 closed.
admin@bastion:~$

Verify service state and load balancing

After Consul ESM starts, verify the UI starts showing a failing instance of hashicups-db service.

Services page - Multiple DB instances - One failing instance

Also notice that a new service, consul-esm, is now present in Consul catalog.

When the health check marks the service instance as failing, Consul DNS will stop including the instance in the results.

$ for i in `seq 1 100` ; do \
    dig @consul-server-0 -p 8600  hashicups-db.service.dc1.consul +short | head -1; \
  done | sort | uniq -c

The output should appear similar to the following example. You may notice that all request are now forwarded to the healthy instance.

     100 172.21.0.5

To confirm that Consul ESM reacts to changes in service state, you will now restart the hashicups-db instance.

$ ssh -i certs/id_rsa hashicups-db-0
#..
admin@hashicups-db-0:~

Start the hashicups-db instance.

$ ./start_service.sh start --consul
Stop pre-existing instances.
START - Start services on all interfaces.
START CONSUL - Starts the service using Consul service name for upstream services (using LB functionality).
NOT APPLICABLE FOR THIS SERVICE - No Upstreams to define.
Start service instance.
Reloading config to listen on all available interfaces.

To continue with the tutorial, exit the SSH session to return to the bastion host.

$ exit
logout
Connection to hashicups-db-0 closed.
admin@bastion:~$

$ ssh -i certs/id_rsa hashicups-api-0
#..
admin@hashicups-api-0:~

Verify that you can resolve all instances of hashicups-db services using Consul.

$ dig hashicups-db.service.dc1.consul

; <<>> DiG 9.18.24-1-Debian <<>> hashicups-db.service.dc1.consul
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50874
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;hashicups-db.service.dc1.consul. IN    A

;; ANSWER SECTION:
hashicups-db.service.dc1.consul. 0 IN   A   172.21.0.8
hashicups-db.service.dc1.consul. 0 IN   A   172.21.0.5

;; Query time: 0 msec
;; SERVER: 172.21.0.7#53(172.21.0.7) (UDP)
;; WHEN: Wed May 22 14:11:58 UTC 2024
;; MSG SIZE  rcvd: 92

Notice the dig command returns two IPs. Consul will load balance requests across all available instances of the service.

The same information can be retrieved from Consul UI, where the hashicups-db service now shows two healthy instances.

Services page - Multiple DB instances - Both healthy Now that Consul ESM takes care of the health monitoring for the external services, you can use Consul service discovery to configure your downstream services.

Restart hashicups-api to use Consul FQDN instead of IP addresses.

$ ./start_service.sh start --consul
Stop pre-existing instances.
hashicups-api-payments
hashicups-api-product
hashicups-api-public
START - Start services on all interfaces.
START CONSUL - Starts the service using Consul service name for upstream services (using LB functionality).
Start service instance.
Service started to listen on all available interfaces.
{
    "db_connection": "host=hashicups-db.service.dc1.consul port=5432 user=hashicups password=hashicups_pwd dbname=products sslmode=disable",
    "bind_address": ":9090",
    "metrics_address": ":9103"
}
bc46f6e4c7267bb1a266135cd19504b6167f4bafac0ea583036122ced11c7fa2
35fc5b73a599d807bcb50fe7e26e52f0d7145b0a4c08d9c7065aef98b67f6f3f
170a050675ba637f2db8f8d09de5ecb4c6a525f5085b972448594c5cba438cc4

From the output you can verify that now the host parameter for the DB connection is set to hashicups-db.service.dc1.consul

To continue with the tutorial, exit the ssh session to return to the bastion host.

$ exit
logout
Connection to hashicups-api-0 closed.
admin@bastion:~$

Monitor an HTTP endpoint with Consul ESM

Some application deployments rely on external HTTP endpoints to retrieve deploy information. These endpoints are usually considered always-on and are not monitored inside the deploy workflow. As a result, errors may occur when a failure in the remote endpoint happens.

Registering these endpoints in the Consul catalog and monitoring them with Consul ESM allows you to standardize your processes and leverage additional Consul features, such as watches and events, to build more resilient workflows.

For example, HashiCorp supports the releases.hashicorp.com and checkpoint.hashicorp.com endpoints. These endpoints are often integrated in the tool installation process when version information is required.

You will now learn how to register those endpoints in Consul and to monitor their health using Consul ESM.

Generate a UUID to use as node ID for the external services.

$ export RANDOM_UUID=`cat /proc/sys/kernel/random/uuid`

Create the service definition for an external node with the relative services and health checks configured. First, for releases.hashicorp.com:

$ tee ~/assets/scenario/conf/external-services/hashicorp-releases.json > /dev/null << EOF
{
  "Datacenter": "$CONSUL_DATACENTER",
  "Node": "hashicorp",
  "ID": "${RANDOM_UUID}",
  "Address": "hashicorp.com",
  "NodeMeta": {
    "external-node": "true"
  },
  "Service": {
    "ID": "releases.hashicorp.com",
    "Service": "hashicorp-releases",
    "Tags": [
      "external",
      "deploy"
    ],
    "Address": "releases.hashicorp.com",
    "Port": 443
  },
  "Checks": [{
    "CheckID": "releases.hashicorp.com",
    "Name": "releases.hashicorp.com check",
    "Status": "warning",
    "ServiceID": "releases.hashicorp.com",
    "Definition": {
      "http": "https://releases.hashicorp.com",
      "Interval": "30s",
      "Timeout": "10s"
     }
  }]
}
EOF

Then, for checkpoint.hashicorp.com:

$ tee ~/assets/scenario/conf/external-services/hashicorp-checkpoint.json > /dev/null << EOF
{
  "Datacenter": "$CONSUL_DATACENTER",
  "Node": "hashicorp",
  "ID": "${RANDOM_UUID}",
  "Address": "hashicorp.com",
  "NodeMeta": {
    "external-node": "true"
  },
  "Service": {
    "ID": "checkpoint.hashicorp.com",
    "Service": "hashicorp-checkpoint",
    "Tags": [
      "external",
      "deploy"
    ],
    "Address": "checkpoint.hashicorp.com",
    "Port": 443
  },
  "Checks": [{
    "CheckID": "checkpoint.hashicorp.com",
    "Name": "checkpoint.hashicorp.com check",
    "Status": "warning",
    "ServiceID": "checkpoint.hashicorp.com",
    "Definition": {
      "http": "https://checkpoint.hashicorp.com",
      "Interval": "30s",
      "Timeout": "10s"
     }
  }]
}
EOF

Note

These service definitions do not include "external-probe": "true" in the NodeMeta field. As a result, Consul ESM will only perform the health checks detailed in the service definition and will not check node health. Because node checks are performed by pinging the external node, we omit the external-probe parameter to prevent checks from failing when the underlying URL does not support a ping.

$ curl --silent \
    --header "X-Consul-Token: $CONSUL_HTTP_TOKEN" \
    --connect-to server.${CONSUL_DATACENTER}.${CONSUL_DOMAIN}:8443:consul-server-0:8443 \
    --cacert ${CONSUL_CACERT} \
    --data @/home/admin/assets/scenario/conf/external-services/hashicorp-checkpoint.json \
    --request PUT \
    https://server.${CONSUL_DATACENTER}.${CONSUL_DOMAIN}:8443/v1/catalog/register

The command outputs true when the registration is performed correctly.

$ curl --silent \
    --header "X-Consul-Token: $CONSUL_HTTP_TOKEN" \
    --connect-to server.${CONSUL_DATACENTER}.${CONSUL_DOMAIN}:8443:consul-server-0:8443 \
    --cacert ${CONSUL_CACERT} \
    --data @/home/admin/assets/scenario/conf/external-services/hashicorp-releases.json \
    --request PUT \
    https://server.${CONSUL_DATACENTER}.${CONSUL_DOMAIN}:8443/v1/catalog/register

The command outputs true when the registration is performed correctly.

Update ACL permissions for Consul ESM

If you assigned strict permissions to the Consul ESM ACL token, you must update the token to include permissions for the new nodes and services.

With strict ACLs, Consul ESM is not able to discover the new nodes and services that represent the releases.hashicorp.com and checkpoint.hashicorp.com endpoints.

As a result, Consul marks these services with a warning status when you register them, and Consul ESM is not be able to perform checks on them or to change their status into passing.

Services page - hashicorp.com - Both warning

To make Consul ESM able to monitor the services, an extended policy with permissions for nodes and services is required.

$ tee ~/assets/scenario/conf/acl-policy-consul-esm-strict-addendum.hcl > /dev/null << EOF
# To check version compatibility and calculating network coordinates
# Requires at least read for the agent API for the Consul node 
# where consul-esm is registered
agent "consul-esm-0" {
  policy = "read"
}

# To store assigned checks
key_prefix "consul-esm/" {
  policy = "write"
}

# To update the status of each node monitored by consul-esm
# Requires one acl block per node
node_prefix "hashicups-db" {
  policy = "write"
}

node_prefix "hashicorp" {
  policy = "write"
}

# To retrieve nodes that need to be monitored
node_prefix "" {
  policy = "read"
}

# To register consul-esm service
service_prefix "consul-esm" {
  policy = "write"
}

service "hashicups-db" {
  policy = "write"
}

service_prefix "hashicorp-" {
  policy = "write"
}

# To acquire consul-esm cluster leader lock when used in HA mode
session "consul-esm-0" {
   policy = "write"
}
EOF

Update the policy associated with Consul ESM token to use these new rules.

$ consul acl policy update \
    -name "acl-policy-consul-esm-strict" \
    -rules @/home/admin/assets/scenario/conf/acl-policy-consul-esm-strict-addendum.hcl > /dev/null 2>&1

This command produces no output.

With the right ACL permissions, Consul ESM will automatically pick the new services from Consul catalog and retrieve the health checks to perform. The Consul ESM logs will show some information regarding the update.

To inspect Consul ESM logs, login to consul-esm-0 from the bastion host.

$ ssh -i certs/id_rsa consul-esm-0
#..
admin@consul-esm-0:~

Logs for Consul ESM are stored in the /tmp folder.

$ cat /tmp/consul-esm*.log

## ...
[INFO]  consul-esm: Updating external node list: items=3
[INFO]  consul-esm: Fetched nodes from catalog: count=3
## ...
[INFO]  consul-esm: found check: name="hashicups-db check"
[INFO]  consul-esm: found check: name="hashicups-db check"
[INFO]  consul-esm: found check: name="checkpoint.hashicorp.com check"
[INFO]  consul-esm: found check: name="releases.hashicorp.com check"
[INFO]  consul-esm: found check: name="hashicups-db check"
[INFO]  consul-esm: found check: name="hashicups-db check"
[DEBUG] consul-esm: Added HTTP check: checkHash=hashicorp/checkpoint.hashicorp.com/checkpoint.hashicorp.com
[DEBUG] consul-esm: Added HTTP check: checkHash=hashicorp/releases.hashicorp.com/releases.hashicorp.com
[INFO]  consul-esm: Updated checks: count=4 found=4 added=2 updated=0 removed=0
[INFO]  consul-esm: Health check counts changed: healthChecks=4 nodes=3
[INFO]  consul-esm: found check: name="checkpoint.hashicorp.com check"
[INFO]  consul-esm: found check: name="releases.hashicorp.com check"
[INFO]  consul-esm: found check: name="hashicups-db check"
[INFO]  consul-esm: found check: name="hashicups-db check"
## ...
[INFO]  consul-esm: Fetched nodes from catalog: count=3
## ...
[DEBUG] consul-esm: Check status updated: check=hashicorp/checkpoint.hashicorp.com/checkpoint.hashicorp.com status=passing
[INFO]  consul-esm: Updating output and status for: checkID=checkpoint.hashicorp.com
[INFO]  consul-esm: found check: name="checkpoint.hashicorp.com check"
[INFO]  consul-esm: found check: name="releases.hashicorp.com check"
[INFO]  consul-esm: found check: name="hashicups-db check"
[INFO]  consul-esm: found check: name="hashicups-db check"
[INFO]  consul-esm: found check: name="checkpoint.hashicorp.com check"
[INFO]  consul-esm: found check: name="releases.hashicorp.com check"
[INFO]  consul-esm: found check: name="hashicups-db check"
[INFO]  consul-esm: found check: name="hashicups-db check"
## ...
[DEBUG] consul-esm: Check status updated: check=hashicorp/releases.hashicorp.com/releases.hashicorp.com status=passing
[INFO]  consul-esm: Updating output and status for: checkID=releases.hashicorp.com
[INFO]  consul-esm: found check: name="checkpoint.hashicorp.com check"
[INFO]  consul-esm: found check: name="releases.hashicorp.com check"
[INFO]  consul-esm: found check: name="hashicups-db check"
[INFO]  consul-esm: found check: name="hashicups-db check"
[DEBUG] consul-esm: Check status updated: check=hashicups-db-1-ext/hashicups-db-1/hashicups-db-1-check status=passing
## ...

Tip

When adding new external nodes to Consul, Consul ESM may take up to 5 minutes to notice them and start performing the health checks. If the services are still in the `warning` state, wait a few minutes and then try again.

After the health checks confirm the health of the new external services, the UI updates to show them as passing.

Services page - hashicorp.com - Both passing

Destroy the infrastructure

Now that the tutorial is complete, clean up the infrastructure you created.

From the ./self-managed/infrastruture/aws folder of the repository, use terraform to destroy the infrastructure.

$ terraform destroy --auto-approve

Next steps

In this tutorial you learned how to register external services to your Consul datacenter and how to access them using the different interfaces provided by Consul: UI, CLI, API, and DNS. You also learned how to use Consul External Service Monitor (ESM) to run health checks over external services and nodes and to keep Consul catalog updated with the results of those health checks.

For more information about the topics covered in this tutorial, refer to the following resources:

To learn more about how Consul service discovery can simplify application deployment and load balancing in your datacenter, refer to the following resources:

Monitor application health

Terminating gateways

Tutorial scenario

Prerequisites

Clone GitHub repository

Create infrastructure

Check initial state in Consul UI

Login into the bastion host VM

Configure CLI to interact with Consul

Verify downstream service configuration

Register external services

Create the service configuration files

Register the external services in the Consul catalog

Verify service and node registration

Verify domain name resolution and load balancing

Create ACL token for Consul ESM

Configure Consul ESM

Start Consul ESM

Verify service state and load balancing

Monitor an HTTP endpoint with Consul ESM

Update ACL permissions for Consul ESM

Destroy the infrastructure

Next steps