Well-Architected Framework
Decommission resources
Resource decommissioning is the process of safely removing or deleting infrastructure components, applications, or services that are no longer needed or have reached end-of-life. You should remove unused or obsolete resources such as servers, databases, images, IAM, and other infrastructure components.
When you decommission unused resources, you gain the following benefits:
- Reduce costs by removing charges associated with unused resources.
- Minimize security risks by removing outdated or vulnerable resources that bad actors can exploit.
- Reduce configuration drift by only running necessary resources.
- Improve audit and compliance by maintaining a smaller infrastructure footprint.
To successfully decommission resources, you need to create a well-defined plan that includes dependency analysis, stakeholder communication, and a gradual removal process. Depending on how your infrastructure implementation is done, either manually or automatically, you may need to adjust your decommissioning approach.
Find resources to decommission
Before you begin decommissioning resources, you need to identify which resources exist in your environment and determine which ones are candidates for removal. This discovery phase helps you avoid accidentally removing resources that are still in use and ensures you target the right components for decommissioning.
Start by creating an inventory of your infrastructure. Most cloud providers offer resource tagging and billing reports that help identify unused or underutilized resources. Pay particular attention to active resources created for temporary purposes, like testing or proof-of-concepts.
Terraform tracks all infrastructure it manages with state files. You can use the terraform state list
to see all managed resources and terraform show
to examine their current configurations. This list of resources will help you identify which resources are still in use and which ones you can decommission.
If you're using HCP Terraform, you can use the workspace explorer feature to gain visibility into the resThat ources your organization manages with Terraform. The explorer provides a visual representation of your infrastructure, making it easier to identify resources that you no longer need.
Create a dependency plan
Your plan should analyze which services, applications, or other resources rely on the components you plan to remove. Your plan will lower the risk of unexpected outages by identifying and addressing dependencies before decommissioning.
If you are using infrastructure as code tools like Terraform, you can use a dependency graph to understand resource relationships. This graph can help you visualize connections between resources and identify potential impacts of removing specific components.
The following command creates a dependency graph of your Terraform resources:
$ terraform graph -type=plan | dot -Tpng > graph.png
Note
You need to install Graphviz on your system to use the terraform graph
command and generate visualizations. For more information on installing Graphviz, refer to the Graphviz installation guide.
HashiCorp resources:
Create a communication plan
Your plan should outline how you will inform stakeholders about the decommissioning process, including timelines and potential impacts. Effective communication prevents surprises and ensures all affected teams can prepare for the changes.
Start by identifying all stakeholders who might be affected by the decommissioning, including development teams, operations staff, end users, and business owners. Create a notification timeline that provides adequate warning. Your communications should explain what resources you are removing, when the decommissioning will occur, and what actions stakeholders need to take.
Create backups
Before decommissioning, confirm that you have backups of any critical data or configurations associated with the resources you are removing. Backups provide a safety net in case you need to roll back changes.
You may want to back up the following resources:
- Servers in the form of machine images
- Database snapshots
- Configuration files
- Metadata
Since Terraform uses infrastructure as code to manage resources, you can redeploy resources that you have previously decommissioned by reapplying your Terraform configuration. This capability allows you to recover resources quickly if needed.
For example, if you backed up a server, you can also redeploy it by updating the AMI in your Terraform with the backed-up AMI ID. In the following example, you can change the ami
attribute to the ID of your backed-up AMI:
resource "aws_instance" "example" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
}
You can also use Terraform to create AWS EBS snapshots before decommissioning instances. The following example creates an EBS snapshot of the root volume of an EC2 instance:
resource "aws_ebs_volume" "example" {
availability_zone = "us-west-2a"
size = 40
tags = {
Name = "HelloWorld"
}
}
resource "aws_ebs_snapshot" "example_snapshot" {
volume_id = aws_ebs_volume.example.id
tags = {
Name = "HelloWorld_snap"
}
}
Gradually remove resources
Implement a phased approach to removing resources instead of doing it all at once. Start by redirecting traffic away from the resource, and monitor user traffic to ensure you don't negatively impact users.
You can use terraform plan
to preview the changes that will occur when you remove resources from your configuration. This command helps you understand the impact of your changes before applying them.
You can also set safeguards so you only decommission resources when you are ready. You can use Terraform's lifecycle
block with prevent_destroy = true
to prevent accidental deletion of critical resources. The lifecycle setting ensures that you won't destroy resources unless you explicitly remove the prevent_destroy
attribute.
resource "aws_instance" "example" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
}
lifecycle {
prevent_destroy = true
}
Consul can help you gradually remove resources by directing traffic away from services you are decommissioning. You can use Consul's service discovery and health checking features to monitor the status of services and ensure that dependent services are not affected during the decommissioning process.
If you are using orchestration tools like Nomad or Kubernetes, you can use their built-in capabilities to drain workloads before decommissioning nodes gracefully. Nomad provides node drain functionality through the nomad node drain
command, which prevents new scheduling new allocations on a node while safely migrating existing jobs to other available nodes. The Kubernetes kubectl drain
command safely removes pods from nodes while respecting Pod Disruption Budgets, which ensure that a minimum number of application replicas remain available throughout the process.
HashiCorp resources:
- Review the Zero-downtime deployments documentation for strategies on how to redirect traffic and disable functions gradually.
- Learn how to manage resource lifecycles with Terraform.
- Get up and running with Nomad by learning about scheduling, setting up a cluster, and deploying an example job.
- Learn the fundamentals of Consul.
Verify health of infrastructure and applications
After the decommissioning process, verify that the remaining infrastructure and applications are functioning correctly. Monitor system performance and user feedback to ensure that there are no negative impacts.
You should do the following steps after you decomission the resources:
- Validate APIs are functioning.
- Check application performance.
- Monitor system logs for errors.
HashiCorp resources:
External resources:
Next steps
In this section of Lifecycle management, you learned about decommissioning resources, including why you should plan decommissioning and how to safely execute the process. Decommission resources is part of the Optimize systems pillar.
To learn more about infrastructure and resource management, refer to the following resource: