Well-Architected Framework
Zero-downtime deployments
Zero-downtime deployment strategies aim to reduce or eliminate downtime when you update your infrastructure or applications. These strategies involve deploying new versions incrementally rather than all at once to detect and resolve issues. Each strategy lets you test the new version in an environment with real user traffic. This helps validate the new release's performance and reliability.
This document set contains best practices for popular zero-downtime deployment methods, such as blue/green, canary, and rolling deployments. It will help you decide the deployment method best for your organization and provide the resources to implement that method.
Note
Stateful workloads like databases require additional work for blue/green, canary, and rolling deployments. Consult your database's documentation while considering these zero-downtime strategies.
Deployment methods
Blue/green, canary, and rolling deployments all improve application reliability and reduce risk. While they share similar goals, each approach offers unique advantages that make it more suitable for certain types of applications or organizational needs. By choosing the most appropriate deployment method, companies can ensure smoother updates and reduce the likelihood of service disruptions.
- Blue/green deployments maintain two identical production environments concurrently. This method allows you to shift traffic from the current version (blue) to the upgraded version (green).
- Canary deployments introduce new versions incrementally to a subset of users. This approach lets you test upgrades with limited exposure, working alongside other deployment systems.
- Rolling deployments update applications gradually across multiple servers. This technique ensures only a portion of your infrastructure changes at once, reducing the risk of widespread issues.
The difference between these strategies is how and where the application deploys. This involves the environment the application runs in, cost considerations, deployment methods, and traffic direction.
Blue/Green | Canary | Rolling | |
---|---|---|---|
Environment Setup | Requires two nearly identical environments. | Requires two nearly identical environments. Starts with a small subset of users or servers. | Updates subsets of servers in batches. |
Traffic Switching | Switches all traffic at once. | Gradually increases traffic to the new version. | Sequentially updates and transitions traffic. |
Rollback Mechanism | Switches back to the blue environment. | Reduces or stop the canary deployment. | Reverts batches; can be more complex. |
Since all three zero-downtime strategies offer similar benefits and aim to achieve zero-downtime deployments, the changes you plan to make will be the most important consideration when determining which deployment to implement. The changes can be either infrastructure or application.
Infrastructure changes involve setting up your environments so they are prepared to host your zero-downtime application. With blue/green deployments, you must have two identical environments. An infrastructure environment can range from creating a new green full stack (servers, networking, or databases) to creating a new cluster to run containers or adding a single green VM to an existing infrastructure stack.
However, it is important to note that running two identical infrastructure environments can increase costs. You can run blue/green environments only in production to save money. You should also have an infrastructure lifecycle strategy, such as using infrastructure-as-code to deploy your green environment only when you plan to deploy your new application version.
Application changes involve deploying and directing traffic to your new application version. You can configure your load balancer or reverse proxies to direct traffic to your green stack and perform canary testing or direct traffic in a controlled manner for rolling deployments.
Service mesh deployments use service splitters to implement zero-downtime deployments. These components, often used in service mesh architectures, allow traffic to route between different versions of an application dynamically.
External resources:
Next steps
In this overview of Zero-downtime deployments, you learned the benefits and tradeoffs of zero-downtime deployments techniques. Visit the following documents to learn specifics on infrastructure, application, and service mesh. Zero-downtime deployments is part of the Define and automate processes pillar.