Well-Architected Framework
Automatically detect resource drift and health
Infrastructure drift occurs when your actual infrastructure state differs from your Terraform configuration, often due to manual changes, cloud provider updates, or unauthorized modifications. This drift can cause deployment failures, security vulnerabilities, and operational inconsistencies that impact your application's reliability and performance.
Automated drift detection and health monitoring help you maintain infrastructure consistency by identifying discrepancies between your desired and actual state. This proactive approach prevents configuration mismatches from accumulating and ensures your infrastructure remains compliant with your defined policies and security requirements.
Implementing effective drift detection requires continuous monitoring, automated validation, and clear remediation processes that help you quickly identify and resolve configuration inconsistencies.
Configure automated drift detection
Automated drift detection continuously monitors your infrastructure to identify when the actual state differs from your Terraform configuration. This helps you catch unauthorized changes, cloud provider updates, and manual modifications that could impact your application's stability.
Use HCP Terraform's health assessments to run periodic refresh-only Terraform plans that automatically detect drift across your infrastructure. Configure these health checks on individual workspaces to monitor specific environments or applications. When drift is detected, you can choose to either run a new Terraform apply to restore the desired state or update your configuration to match the new state.
Implement drift detection as part of your CI/CD pipeline to catch configuration issues before they reach production. Configure your pipeline to run drift detection checks after deployments and during regular maintenance windows to ensure ongoing compliance.
Implement custom health conditions
Custom health conditions allow you to validate your infrastructure beyond basic configuration compliance. These conditions can check application health, security posture, performance metrics, and business-specific requirements that are critical for your application's success.
Write custom conditions in Terraform to validate your resources with application-specific logic. For example, if your Terraform configuration deploys a web application, create a custom condition that sends an HTTP request to verify the application returns a healthy response. This ensures your infrastructure not only exists but also functions correctly.
Use Terraform's check blocks to implement custom validation logic that runs during plan and apply operations. These checks can validate resource configurations, test connectivity, verify security settings, and ensure compliance with organizational policies.
Monitor and respond to drift
Effective drift detection requires comprehensive monitoring and clear response procedures. Implement alerting and notification systems that immediately inform your team when drift is detected, enabling quick response to potential issues.
Configure different alert levels based on the severity and impact of detected drift. Critical drift that affects security or application availability should trigger immediate alerts, while minor configuration differences might generate lower-priority notifications for review during regular maintenance windows.
Establish clear remediation procedures for different types of drift. Document the steps for investigating drift causes, determining whether to restore the desired state or update the configuration, and implementing the chosen resolution. This ensures consistent handling of drift incidents across your team.
Next steps
In this section of Monitor system health, you learned about implementing automated drift detection and health monitoring, including configuring automated detection, implementing custom health conditions, and monitoring and responding to drift. Automatically detect resource drift and health is part of the Optimize systems.
Refer to the following documents to learn more about infrastructure monitoring:
- Identify common metrics to monitor the right performance indicators
- Monitor network traffic to track network performance and connectivity
If you are interested in learning more about drift detection and health monitoring, you can check out the following resources:
- HCP Terraform health assessments - Documentation for configuring health assessments
- Use health assessments to detect infrastructure drift - Tutorial for implementing drift detection
- Manage resource drift - Guide to managing and resolving drift
- Terraform custom conditions - Documentation for writing custom validation logic
- Use checks to validate infrastructure - Tutorial for implementing infrastructure validation