Well-Architected Framework
Decommission infrastructure resources
Resource decommissioning is the process of safely removing infrastructure components, applications, or services that are no longer needed or have reached end-of-life. Without a decommissioning practice, unused resources accumulate, increase costs, and expand your security attack surface.
Why decommission resources
Decommissioning infrastructure resources addresses the following operational challenges:
Eliminate unnecessary spending: Unused resources continue to incur charges even when they provide no value. Servers, databases, and storage volumes left running after a project ends or a service is retired accumulate costs. A systematic decommissioning process ensures you only pay for infrastructure that actively serves your organization.
Reduce security exposure: Outdated or unpatched resources give attackers an entry point into your environment. An abandoned VM running an old OS version or an orphaned IAM role with broad permissions creates an exploitable foothold. Removing unused resources reduces the surface area that attackers can target.
Prevent configuration drift: Running unnecessary resources introduces complexity that makes your infrastructure harder to manage and understand. Every additional resource is something that can drift from its intended state, fail silently, or create unexpected dependencies. Keeping your infrastructure footprint small improves visibility and simplifies audits and compliance reviews.
To successfully decommission resources, you need to create a well-defined plan that includes dependency analysis, stakeholder communication, and a gradual removal process. Depending on whether you implement your infrastructure manually or with automation tools, you may need to adjust your decommissioning approach.
Find resources to decommission
Before you begin decommissioning resources, you need to identify which resources exist in your environment and determine which ones to remove.
Start by creating an inventory of your infrastructure. Most cloud providers offer resource tagging and billing reports that help identify unused or underutilized resources. Pay particular attention to resources created for temporary purposes, like testing or proof-of-concepts.
Terraform tracks all infrastructure it manages with state files. You can use the terraform state list to see all managed resources and terraform show to examine their current configurations.
If you're using HCP Terraform, you can use the workspace explorer feature to see all resources your organization manages with Terraform. The explorer visualizes your infrastructure, helping you identify unused resources.
Create a dependency plan
Analyze which services, applications, or other resources depend on the components you plan to remove. Identifying and addressing dependencies before decommissioning lowers the risk of unexpected outages.
If you are using infrastructure as code tools like Terraform, you can use a dependency graph to understand resource relationships. The graph shows connections between resources and highlights the potential impact of removing specific components.
The following command creates a dependency graph of your Terraform resources:
$ terraform graph -type=plan | dot -Tpng > graph.png
Plan stakeholder communication
Document how you will notify stakeholders about the decommissioning process, including timelines and potential impacts.
Start by identifying all stakeholders who might be affected by the decommissioning, including development teams, operations staff, end users, and business owners. Create a notification timeline that gives teams enough time to prepare. Your communications should explain what resources you are removing, when the decommissioning occurs, and what actions stakeholders need to take.
Back up data before decommissioning
Before decommissioning, confirm that you have backups of any critical data or configurations associated with the resources you are removing. Backups provide a safety net in case you need to roll back changes.
You may want to back up the following resources:
- Servers in the form of machine images
- Database snapshots
- Configuration files
- Metadata
Since Terraform uses infrastructure as code to manage resources, you can redeploy resources that you have previously decommissioned by reapplying your Terraform configuration.
For example, if you backed up a server, you can also redeploy it by updating the AMI in your Terraform with the backed-up AMI ID. In the following example, you can change the ami attribute to the ID of your backed-up AMI:
resource "aws_instance" "example" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
}
Updating the ami attribute to a backed-up image ID lets Terraform redeploy the instance from a known-good state with a single terraform apply. Terraform manages the replacement as a resource update, preserving any other configuration attributes already defined.
You can also use Terraform to create AWS EBS snapshots before decommissioning instances. The following example creates an EBS snapshot of the root volume of an EC2 instance:
resource "aws_ebs_volume" "example" {
availability_zone = "us-west-2a"
size = 40
tags = {
Name = "HelloWorld"
}
}
resource "aws_ebs_snapshot" "example_snapshot" {
volume_id = aws_ebs_volume.example.id
tags = {
Name = "HelloWorld_snap"
}
}
The snapshot preserves the volume's data in AWS at a fraction of the storage cost, giving you a recoverable copy if you need to restore the resource later.
Gradually remove resources
Implement a phased approach to removing resources instead of doing it all at once. Start by redirecting traffic away from the resource, and monitor user traffic to ensure you don't negatively impact users.
You can use terraform plan to preview the changes that occur when you remove resources from your configuration. The plan command helps you understand the impact of your changes before applying them.
You can also set safeguards so you only decommission resources when you are ready. You can use Terraform's lifecycle block with prevent_destroy = true to prevent accidental deletion of critical resources. The lifecycle setting ensures that you won't destroy resources unless you explicitly remove the prevent_destroy attribute.
resource "aws_instance" "example" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.micro"
lifecycle {
prevent_destroy = true
}
}
With prevent_destroy = true, Terraform returns an error if any plan would destroy this instance, even if you run terraform destroy. To decommission the resource intentionally, remove the lifecycle block and re-apply before running destroy.
Use Consul's service discovery and health checking to redirect traffic away from services before you remove them, so you don't affect dependent services.
If you are using orchestration tools like Nomad or Kubernetes, you can use their built-in capabilities to drain workloads before decommissioning nodes gracefully. Nomad provides node drain functionality through the nomad node drain command, which prevents scheduling new allocations on a node while safely migrating existing jobs to other available nodes. The Kubernetes kubectl drain command safely removes pods from nodes while respecting Pod Disruption Budgets. Pod Disruption Budgets ensure a minimum number of application replicas remain available throughout the process.
Verify infrastructure health post-decommissioning
After the decommissioning process, verify that the remaining infrastructure and applications are functioning correctly. Monitor system performance and user feedback to catch any unexpected issues.
After decommissioning, complete the following steps:
- Validate APIs are functioning.
- Check application performance.
- Monitor system logs for errors.
HashiCorp resources
- Read the Terraform graph command documentation to generate dependency visualizations for your infrastructure.
- Learn to set up monitoring agents and configure dashboards and alerts.
- Review the Zero-downtime deployments documentation for strategies on how to redirect traffic and disable functions gradually.
- Learn how to manage resource lifecycles with Terraform.
- Get up and running with Nomad by learning about scheduling, setting up a cluster, and deploying an example job.
- Learn the fundamentals of Consul.
External resources
- Read the AWS guidance to implement a decommissioning process.
Next steps
In this section of Lifecycle management, you learned about decommissioning resources, including why you should plan decommissioning and how to safely execute the process.
To continue building your lifecycle management practices, refer to the following resources:
- Read the Automate cloud storage lifecycle policies guide
- Read the Tag cloud resources guide