Implement circuit breaking in Consul service mesh with Envoy
When you have a set of services calling each other, a service failure or increased latency in one service can result in cascading failures to downstream dependencies. These cascading failures can exhaust application resources. One way to prevent this type of failure is the circuit breaker pattern. A circuit breaker quickly detects failures and returns an error for a fixed amount of time.
This tutorial covers:
- Using a service mesh to eject failed instances and prevent cascading failures
- Configuring Consul to implement circuit breaking
Circuit breaking is typically achieved through user code or network configuration. While some libraries and frameworks offer circuit breaking through code, if you have services written in a variety of languages you can easily introduce inconsistency across implementations. It may also be the case that adding additional application code is either not desirable or not possible. Implementing circuit breaking using the service mesh may be more desirable since it reduces application code and decouples infrastructure concerns from application logic.
To implement circuit breaking in Consul, you must configure two Envoy settings: one for the Envoy circuit breaking feature and another for Envoy outlier detection. Envoy circuit breaking implements the bulkhead pattern, which sets the maximum, pending, and concurrent connections for a pool of upstream service instances. Envoy outlier detection handles the ejection of services that are flagged by the circuit breaker. If an upstream service instance returns the maximum allowed consecutive HTTP 5xx errors, Envoy will eject the service instance from the pool of upstream service instances. The combination of these two settings effective implements the circuit breaker pattern, first by detecting failures and then by ejecting the failed service instance.
In this tutorial, you will start two services: api
and web
. The web
service connects to the upstream api
service using Consul service mesh which
load balances between two versions of the api
service.
You will introduce an error into api-v2
that trips the circuit breaker,
sets it to an open state, and sends requests directly to the working api-v1
.
This tutorial focuses on configuring Consul for circuit breaking. The Consul deployment in this tutorial is not suitable for production environments as it includes Consul dev agents, internal proxies, and mock services.
Prerequisites
- Docker Compose
- Docker
- jq
- curl
- Consul 1.9.0+: The UI described here is only available in Consul version 1.9.0 and later.
Clone the GitHub repository that contains the files you'll use with this tutorial.
Change directories into the repository you just cloned.
Checkout the tagged version verified for this tutorial.
This directory will be referred to as your working directory, and you will run the rest of the commands in this tutorial from this directory.
Configure services and defaults
Once you have set up the prerequisites, configure the Consul services to use Envoy's circuit breaker and outlier detection.
Open central_config/global-defaults.hcl
. The web
and api
services default to the HTTP protocol. You can set the
central configuration to this default. Consul propagates this
configuration to Envoy.
Open service_config/api-v1.hcl
. This file configures the sidecar proxy
for api-v1
. A similar configuration exists for api-v2
.
Open service_config/web.hcl
for the Consul configuration of the
web
service. The web
service connects to the api
service upstream.
By default, Consul load balances requests round-robin between each
version of the api
service without additional configuration.
In addition to defining upstream services, set the
proxy.config.limits
for the maximum connections, pending requests, and concurrent requests.
The upstream api
service represents a pool with api-v1
and api-v2
.
Configure the
proxy.config.passive_health_check
setting to trip after 10 api
service failures (with the max_failures
setting)
and retry an ejected instance after thirty seconds (with the interval
setting).
You will simulate a failure in the api-v2
service instance.
The api-v2
service instance will be ejected after 10 HTTP 5xx errors. Consul will check
the service every 30 seconds to determine if api-v2
should
rejoin the cluster.
Note
If you do not specify limits
or passive_health_check
, Consul uses
Envoy's outlier detection defaults.
Deploy Consul and the services
After configuring circuit breaking settings, deploy a Consul server, the web
service,
an api-v1
and an api-v2
service, and Prometheus. Prometheus aggregates metrics from
services which can be viewed in the Consul UI.
Configure api-v1
Open docker-compose.yaml
. This file deploys Consul, the web
service and sidecar proxy,
the api-v1
service and sidecar proxy, and Prometheus. The docker-compose.yaml
uses volume
mounts to add the central and service configurations to the containers.
Configure api-v2
Open docker-compose-success.yaml
. This file configures api-v2
without
introducing a failure into the service. Recall Consul will load balance
requests between service instances that share a service name. In this case,
Consul directs requests to the api
service and load balancers
between api-v1
and api-v2
.
Deploy the services
Deploy Consul, the web
service, the api-v1
service, the api-v2
service, and Prometheus.
Open the Consul UI in your browser at http://localhost:8500.
The web
and api
services register as healthy.
Note
If you are connecting this tutorial to your own Consul instance,
you will need to configure intentions to allow traffic between
the web
and api
services.
Use curl
to make an API request to the web
service. It returns which version of
the api
service it called and the HTTP status code.
Running the request multiple times illustrates the load balancing
of requests between api-v1
and api-v2
.
Trip the circuit breaker
Deploy a failing instance of api-v2
.
Perform a GET request to the web
service a few times. Any time the request gets forwarded
to api-v1
, the status code registers an HTTP 200 (success).
However, requests forwarded to api-v2
return an HTTP 500 error.
Recall that you configured Envoy's outlier detection to set the circuit breaker to open
after 10 failed attempts to connect to api-v2
. Trip the circuit breaker by sending
1000 API requests to the web
service. Any time the web
service calls api-v2
,
it receives an HTTP 500 error. Eventually, Envoy will flag api-v2
as an outlier and
eject it from the api
cluster. Any additional requests to api-v2
causes the proxy
to immediately throw an HTTP 503 error (indicated by null,503
)
and direct all future requests to api-v1
.
Verify ejection of failing service instance
To verify that outlier detection ejected api-v2
, open
this link
in your browser. This link opens a Prometheus graph to
the envoy_cluster_outlier_detection_ejections_active
metric.
Click the "Execute" button to refresh the graph.
This gauge illustrates how Envoy manages the ejection of api-v2
. The first time api-v2
throws
an error, Envoy ejects it for 30 seconds, which is the interval
setting. When Envoy retries api-v2
, Envoy receives an error and ejects api-v2
a second time.
This time api-v2
will be ejected for 60 seconds because it has been ejected twice.
As api-v2
continues to return an error, the base ejection time of 30 seconds gets
multiplied by the number of times the host has been
ejected. The active ejection period becomes longer as the service continues to error.
Check availability of web service
Recall that a circuit breaker prevents cascading failures to downstream services, such
as the web
service. The web
service should remain available since api-v1
successfully completes requests and the circuit breaker is open.
Periodically, the proxy will determine if it should bring api-v2
back into the pool
of services instances. Given api-v2
will continue to return an error, the web
service
should register a failure quickly and its next request will be to api-v1
.
Open the Consul UI in your browser to view the
api
service's topology.
The graph includes the error rates of the web
service and api
service.
Periodically, the api
service returns an error but the overall error rate of
the web
service remains less than 1%. When the circuit breaker opens, it prevents
cascading errors to the web
service.
Clean up
Once you are done with this tutorial, you can stop the requests to the web
service.
Delete the containers for Consul, Prometheus, web
, api-v1
, and api-v2
.
Next steps
Now that you have completed this tutorial you have familiarized yourself with how to manually configure Consul and Envoy for circuit breaking. You simulated a failing upstream service and tripped the circuit breaker to eject the failing service from the pool.
To learn how to configure Consul health checks without Envoy, complete the tutorial to Ensure Only Healthy Services are Discoverable.
In the next tutorial, Load Balancing Services in Consul Service Mesh with Envoy you will learn how to change the load balancing policy for a service in your mesh.
This tutorial also appears in:
- 6 tutorialsApplication ResiliencyLearn how Consul can improve application resiliency and availability through the service mesh and other advanced network features. Use the knowledge shared in this collection to improve application resiliency with Consul and validate the results through Chaos engineering practices.