Use Health Assessments to Detect Infrastructure Drift
Terraform Cloud's health assessments monitor your managed infrastructure to check that it still satisfies its intended configuration over its lifecycle. Over time, your resources may change outside of the Terraform workflow. This can be due to service failures or degradation, certificate expirations, or manual modification by other users. Terraform cannot prevent these changes, but health assessments help you detect them quickly so you can resolve them.
Health assessments include two types of checks:
- Drift detection verifies that your actual infrastructure settings match those recorded in your workspace's state file.
- Continuous validation verifies that your resources still satisfy any custom conditions defined in your configuration. Continuous validation is in beta.
In this tutorial, you will enable health assessments for a workspace, use an on-demand assessment to detect configuration drift, and review the options for resolving drift.
Tip
Health assessments are available in Terraform Cloud Plus Edition. Refer to Terraform Cloud pricing for details.
Prerequisites
This tutorial assumes that you are familiar with the Terraform and Terraform Cloud workflows. If you are new to Terraform, complete the Get Started tutorials first. If you are new to Terraform Cloud, complete the Terraform Cloud Get Started tutorials first.
In order to complete this tutorial, you will need:
- A Terraform Cloud organization with the Plus edition.
- A Terraform Cloud user account with organization owner permissions and Terraform Cloud locally authenticated.
- Terraform v1.4+ installed locally.
- An AWS account.
- A Terraform Cloud variable set configured with your AWS credentials.
Clone example repository
Clone the example repository for this tutorial, which contains configuration for AWS networking components, an EC2 instance, and a security group.
Change to the repository directory.
Open main.tf
to review the example configuration. The EC2 instance is configured to act as the single point of SSH ingress traffic to other instances within the network, also known as a bastion host.
Specifically, the aws_security_group.bastion
restricts ingress traffic to a specific CIDR block, which can represent an organization's private network.
Create infrastructure
First, set your Terraform Cloud organization name as an environment variable to configure your Terraform Cloud integration.
Initialize your configuration. As part of initialization, Terraform creates your learn-terraform-cloud-drift-detection
Terraform Cloud workspace.
Now, apply your configuration to create the infrastructure. Respond yes
to the prompt to confirm the operation.
Introduce infrastructure drift
Although best practice is to use Terraform for all infrastructure changes to ensure consistent workflows and change visibility, your organization may occasionally need to make manual changes.
To simulate this, navigate to your security groups in the AWS console.
Find the bastion_ssh
security group. Select the Inbound rules tab in the security group details, then click Edit inbound rules.
Delete the 192.80.0.0/16
source CIDR and replace it with 0.0.0.0/0
. Then click Save rules.
Your security group's actual configuration no longer matches the settings recorded in your workspace's state file.
Enable health assessments
Terraform Cloud health assessments help detect manual changes such as the one you introduced in the last section, helping you maintain visibility into your actual infrastructure settings. A workspace must satisfy the following prerequisites in order to perform health assessments:
- Use Terraform 0.15.4+ to support drift detection, or 1.3.0+ to support both drift detection and continuous validation.
- You must perform at least one run in the workspace, and the last run must complete successfully.
- The workspace must use remote or agent execution mode.
Assessments use non-actionable, refresh-only plans. These runs compare the actual settings of your infrastructure against the resources tracked in your workspace’s state file. The assessments do not update your state or infrastructure configuration.
You can enable assessments on a specific workspace, or on all workspaces within your organization. Enable assessments on the workspace for this tutorial. First, navigate to your learn-terraform-cloud-drift-detection
workspace. In the Health section, select Settings. Then, select Enable and click Save settings.
Once you enable assessments for a workspace, Terraform Cloud will perform assessments once every 24 hours or so. These runs do not interfere with other Terraform operations in the workspace, and any new run for the workspace reschedules the next assessment for 24 hours later. If a run fails, Terraform Cloud pauses assessments until you resolve the issue and the workspace performs a successful run.
Trigger an on-demand assessment
You can use on-demand assessments to avoid waiting for scheduled assessments and detect drift sooner.
In the Health section of your learn-terraform-cloud-drift-detection
workspace, click Start health assessment.
Terraform Cloud performs the assessment and displays the results. As expected, it detects the change you made to your bastion security group's inbound rules.
The assessment also detected changes to default values for some resource fields. The example Terraform configuration does not set values for these fields, but AWS sets default values when it provisions your resources. This is a common occurrence that depends on how the Terraform provider assigns unset values.
Terraform Cloud displays the assessment results on the workspace overview page.
You can also filter workspaces by assessment result on your organization's workspace landing page.
You can additionally configure workspace-specific notifications to alert you if there are specific assessment results.
Reconcile drift
For more complex configurations, drift can introduce unpredictability to operations. If your team deploys a new change to your infrastructure and only identifies the drift during the Terraform run, they may need to interrupt their workflow to decide how to proceed. If a change set includes many changes to resources, the operator needs to carefully review the execution plan to understand the drift.
Health assessments help you proactively detect drift and condition failures, so you can resolve them before they interfere with future operations. In this case, the infrastructure configuration is small, so it is easier to decide how to proceed.
When you identify infrastructure drift, you have two resolution options:
- If you wish to keep the change, you can update your configuration to reflect the new setting, then run a Terraform apply to update your state file.
- To revert the change back to the original setting in your configuration, run a Terraform apply to overwrite that change.
In this case, you should revert the manual security group change to prevent public ingress traffic to the bastion host.
In your example repository directory, run another terraform apply
to update the security group to its initial configuration. Respond yes
to the prompt to confirm the operation.
This run resets the next scheduled assessment for 24 hours later. Trigger another on-demand assessment to confirm that you resolved the infrastructure drift.
This time, Terraform Cloud does not detect any resource drift. You reverted your security group settings to the ones specified by your configuration, and Terraform updated the workspace state file to account for the provider-specific default and null value settings.
Destroy infrastructure
Destroy your infrastructure to avoid incurring unnecessary costs. Respond yes
to the prompt to confirm the operation.
Optionally, delete your learn-terraform-cloud-drift-detection
workspace from Terraform Cloud.
Next steps
In this tutorial, you enabled health assessments for a workspace and used an on-demand assessment to detect infrastructure drift. You also learned how health assessments can preserve predictability in your infrastructure operations.
Review the following resources to learn more about how you can ensure infrastructure conformity using Terraform and Terraform Cloud.
- Configure and use OPA policies for infrastructure compliance, to establish guarantees around your infrastructure configuration and workflows.
- Define custom resource conditions to establish configuration-level compliance checks.
- Use policies to control infrastructure costs.
- Use the Snyk Terraform Cloud run task to scan your infrastructure for security compliance.