Blue-Green Deployments with Waypoint, Nomad and Consul
In addition to common methods of deploying applications, HashiCorp Waypoint supports the orchestration of advanced deployment workflows, such as blue-green deployments. A blue-green deployment is where a new version of an application is deployed alongside the existing version in a separate environment. While both versions are running, DNS is updated to split traffic flowing to the application between the new and existing versions of the app. This enables essentially nil downtime for the application during deployments in most cases, but also enables operators to provide the experience of the new version of an application to a certain percentage of users without forcing all users to use the newer version. Depending upon feedback and testing while the new version is live and processing a partial amount of traffic, a decision is made to either promote or roll back the new version, and DNS is again updated to route 100% of traffic to all instances running the application.
In the HashiStack, this is made possible with features from the workload orchestration tool, HashiCorp Nomad, and the service networking tool, HashiCorp Consul. Nomad enables operators to deploy canary instances of a newer version of an application without destroying the existing instances. The canary deployment can be auto-promoted when "healthy", or manually promoted by operators. Consul plays a role as well through service configuration entries. The service resolver config entry controls which instances of a service should be matched by downstream requests made to the service, which can be filtered by service tags. A service splitter config entry splits a configured percentage of traffic to resolvers. These features of Nomad and Consul work together by the canary instances of a new Nomad job version registering new instances of a Consul service with canary tags. The tags applied to the canary instances of the service match a service resolver. The service splitter is then updated while both the new and old application instances are running to split some percentage of traffic between the resolver for the new and old instances of the service.
To achieve this, a series of commands or API calls would need to be run manually, or automated through scripts or some other means; but Waypoint pipelines can orchestrate this! Waypoint pipelines can execute configured phases of your app's build, deployment, and release lifecycle, and run custom steps such as modifying Consul config entries to configure DNS to route traffic to your canary deployment instances.
An example where this is configured is located here. The Waypoint configuration file contains three pipelines, and each one will be described in this use case.
Waypoint Configuration
Pre-Requisites
If experimenting with this use case, it will be helpful to have some familiarity with the tools and concepts listed below.
- Waypoint CLI
- Context created and connected to local Waypoint server or HCP Waypoint server
- Waypoint runner
- Must be able to connect to HashiCorp Nomad & HashiCorp Consul clusters
- Waypoint runner configuration
- Used to connect to HashiCorp Nomad
- Docker registry
- Credentials to push a build to the registry
- HashiCorp Consul cluster
- HashiCorp Nomad cluster
- Consul DNS
curl
Pipelines
Pipeline 1: build-and-blue-green-deployment
This pipeline builds the application image, and subsequently executes pipeline #2 using a nested pipeline.
The build
step uses the build
stanza of the "app" that is configured later in
the Waypoint configuration. This build uses the build pack plugin to build an image
and then pushes it to a Docker registry.
Pipeline 2: blue-green-deployment
This pipeline deploys the build artifact to Nomad, and updates Consul to split traffic 50/50 between new and old service instances. This pipeline is configured separately, so that in scenarios where the code does not need to be re-built into a new Docker image, the pipeline run time will be less, because it will run only the deployment, and not a build.
The split-traffic-to-green
step in this pipeline first downloads a file from the
git repository to the container performing the exec step of this pipeline using curl
.
Then, since it is running the official Consul Docker image, the command consul config write
is executed, which writes the downloaded configuration entry to the Consul
cluster via the API. The configuration entry looks like the below:
This is a service splitter config entry. The different "subsets" in this configuration,
blue
, and green
, align with a service resolver that has already been configured
in the Consul cluster. This configuration (as a Terraform resource
using the Consul provider) is depicted below, where instances of the "app" service
tagged with blue
are a part of the blue
subset, and those tagged with green
are
a part of the green
subset:
Since a Nomad job is being deployed in this pipeline, the tags on the service instances
are derived from the Nomad jobspec template. Specifically, this is configured in the
service
stanza of the Nomad job. The tags
are applied to the Nomad allocations
running the current active version of the Nomad job, and canary_tags
are applied
to the canary Nomad allocations rolled out during a deployment.
The exact number of canary Nomad allocations to be deployed, with the canary_tags
from the service
stanza, is configured in the update
stanza's canary
field:
In summary, this pipeline will perform a canary deployment of the artifact, built
by the build pipeline, to Nomad. After the deployment to Nomad is complete, the
Consul service's service splitter is updated to split traffic 50/50 between the
new green
instances, and the existing blue
ones.
Pipeline 3: promotion-and-normalize-traffic
The third pipeline normalizes traffic to route to the blue
instances and promotes
the canary deployment in Nomad. Updating the service splitter is done similarly to
how traffic was split in the prior pipeline.
The Consul config entry written in the normalize-traffic"
step of this pipeline
directs 100% of traffic to blue
instances once again, as depicted in the configuration
below:
The promote-deployment
step will promote the Nomad canary deployment, which will
update the new green
instances to become the current active version of the Nomad
job. This will make those instances blue
, and the instances running the previous
version of the Nomad job will be destroyed.
Blue-Green Deployment Pipeline Run
1st time deployment - Blue
The first thing to do is deploy the app to Nomad for the very first time. There
will be no canary deployment, because it is the first one - canary deployments
will happen on subsequent deployments, as per the update
stanza of the Nomad
job. The pipeline will be started using the command below, which assumes that a
remote runner is already installed, with the correct configurations to connect
to the Nomad cluster.
At the conclusion of this pipeline run, the job is expected to be up and running in Nomad. This can be verified in Nomad directly, or with Waypoint!
The Nomad UI also shows that the job is up and running with one allocation.
The Waypoint UI indicates that the deployment is running too!
And of course, curl
-ing the application at the address and port will say:
Deploy a new version - Green
With a very basic HTTP server reporting "Hello World!" deployed, it's time to make some changes to the application, like you might find in this commit. Here, the application is updated to no longer say "Hello World!", but to instead inform the user if a successful connection was made to a Postgres database. A few environment variables, listed below, will be read by the application to make this connection:
- USERNAME
- PASSWORD
- HOST
- PORT
- DBNAME
The config
stanza in the Waypoint configuration file will set these variables
in the application's environment dynamically. You can learn more about this in
the Waypoint documentation on config variables.
Any change could be made to this application though, not just database connections. But with a new change in place, and pushed to the git repository, it's time to rebuild the application, start the canary deployment, and split traffic to the new version of the application, by starting the pipeline once more:
This time around, the Nomad UI indicates that there is indeed a canary allocation, waiting to be promoted, but running alongside the allocation from the first deployment.
In Consul, the configuration entry for the service splitter has been updated to split traffic 50% between the blue and green service resolvers.
Since the new service instance is tagged green
, Consul DNS enables users to target
the instance with that tag using the tag name.
At the same time, the old (blue) version of the application hasn't been taken down, so it still remains accessible to users.
Promote Deployment
With the canary deployment still idle, running the blue and green instances of the application, and after having tested the application to verify that the changes are what was expected, the deployment can be promoted. This will kill the allocations running the older version of the app, and the update to the service splitter will be made, to route all traffic to the remaining instance, tagged blue.
In Nomad, it is evident that there are no more canary allocations, and the app is up and running.
In Consul, 100% of traffic is routed to the remaining blue instance, with the update of the service splitter.
With this deployment completed, all users of the application will be using the latest changes. As further changes are made to the application, the same pipelines run in this example can be run repeatedly while incrementally expanding its functionality, minimizing downtime and user impact.