Continuous validation
We recommend that organizations in the scaling phase of the maturity model adopt continuous validation for their workspaces especially to track the health of critical resources.
Continuous validation is the second component of the health check feature in HCP Terraform and TFE. The purpose of health checks is that by running regularly, they allow the infrastructure engineer to detect (and proactively fix) issues before they impair the next Terraform apply operation.
While drift detection flags out-of-band changes to the managed infrastructure that can affect a Terraform apply operation, continuous validation addresses use cases where more customizable detection rules are necessary. Some examples of those use cases are:
- To flag issues with infrastructure elements managed outside of the workspace, but that affect the health of this workspace’s resources.
- To identify if cloud services or third-party tools have detected issues with the managed infrastructure resources.
Please review official documentation on continuous validation before implementing.
Why Continuous Validation?
Failed infrastructure changes can be costly to the organization as they may introduce project delays and could expose the organization to operational or security risks. Adopting continuous validation gives infrastructure teams advance notice of issues preventing successful changes in configuration. These issues can then be addressed and failed infrastructure changes avoided.
Best Practice Recommendation
When a new workspace is created to manage infrastructure, continuous validation should be enabled (either explicitly at the workspace level or implicitly at the organization level).
The infrastructure as code engineer should also include in the Terraform configuration code the necessary logic to validate important components of the infrastructure whose health may change over time and prevent a successful Terraform Apply run the next time an infrastructure configuration change is necessary.
In addition, if infrastructure changes do fail in the future because of a condition that was not checked, an engineer should update the Terraform configuration to incorporate this new validation. If applicable, this new pattern should be applied to existing infrastructure code and added to the checklist for future Terraform configurations.
Rule of thumb on which resources should have continuous validation applied:
- Check the status of any critical resource that can fail (e.g VM)
- This is not necessary for certain resources such as S3 buckets which are native to the cloud provider.
- Validity of resources such as certificates that have user defined time frames but whose failure can have an impact on the application stack.
Implementation guidance
In this section, we’ll go over the steps and recommendations to implement continuous validation.
Requirements
Continuous validation requires the use of Terraform language features that aren’t available in older versions of the runtime. The table below lists the language feature and the minimum version of the Terraform runtime that supports it.
Language feature | Version requirement | Useful links |
---|---|---|
Preconditions and postconditions | 1.2 and later | Terraform 1.2 Improves Exception Handling and Updates to the CLI-driven Workflow Preconditions and postconditions |
Check block | 1.5 and later | Terraform 1.5 brings config-driven import and checks Checks with assertions |
Permissions required
To configure continuous validation, you’ll need the following permissions:
- To change organization health settings, you must be a member of the owners team.
- To change a workspace’s health settings or trigger an on-demand health assessment, you must be an administrator for that workspace.
To view the continuous validation status:
- To view health status for a workspace, you need read access to that workspace.
- To view the status of all workspaces in the HCP Terraform Explorer, you need to be a member of the owners team or have the “View all workspaces” permission or better.
Enabling health assessments
Continuous validation being a component of the health assessment feature, to use continuous validation you must enable the feature.
You may enable health assessments at the organization level or at the workspace level, using the WebUI, the API or using Terraform code. Whenever possible we recommend using Terraform code to configure your HCP Terraform or Terraform Enterprise instance.
Enable health assessments at the … | Using the WebUI | Using the API | Using Terraform |
---|---|---|---|
Organization level | Managing Settings section | Organizations API | TFE provider: tfe_organization resource |
Workspace level | Enable health assessments section | Workspaces API | TFE provider: tfe_workspace resource |
Before adopting continuous validation, review your inventory of workspaces and specifically the Terraform version they are configured to use. Also review your code base and any Terraform required version constraint. If one or more workspaces do not meet the minimum requirements, they will not fully benefit from health assessments: Drift detection may be available (support was added in Terraform 0.15.4) but not the continuous validation health assessments.
If you decide to enable health assessments at the workspace level and you are using Terraform code to configure your HCP Terraform or Terraform Enterprise instance, you can use the flexibility of the Terraform language to selectively activate health assessments on workspaces that meet the requirements.
Defining custom assertions and checks
While continuous validation will evaluate preconditions, postconditions, and check blocks as part of an assessment, we recommend using check blocks for post-apply monitoring. Check blocks do not stop a Terraform run when they fail (contrary to preconditions and postconditions) which makes them ideal tools for monitoring.
Here is a list of useful resources to learn more about the check block:
- Checks (documentation)
- Use checks to validate infrastructure (tutorial)
- Health checks with Terraform Cloud continuous validation (demo)
Here is a list of useful resources showcasing examples using the check block:
- Ensure your AWS account is within budget (AWS)
- Check GuardDuty for Threats (AWS)
- Check for unused IAM roles (AWS)
- Check EKS Cluster Instance Health and Availability (AWS)
- Check for EC2 Stopped Instances (AWS)
- Check if a VM's is not running (Azure)
- Check if a Container App certificate will expire within a certain timeframe (Azure)
- Check if an App Service Function or Web App has exceeded its usage limit (Azure)
- Assert a VM is in a running state (GCP)
- Check if a certificate will expire within a certain timeframe (GCP)
- Validate the status of a Cloud Function (GCP)
Authentication considerations
Health checks require valid credentials in the workspace/run context - If you inject temporary credentials via external pipelines, which then are expired when health/drift runs happen, we recommend adopting “Dynamic provider credentials” which provide the same benefits of temporary credentials while being compatible with health checks
Viewing health status and getting notified
You can view the workspace(s) health status:
- From the workspace summary page (the Health panel on the right side of the interface).
- From the workspace continuous validation status page (from the workspace page, go to Health > Continuous validation).
- From the Projects & Workspaces view.
- From the HCP Terraform Explorer.
You can also configure notifications to get alerted on a number of events, including:
- When a continuous validation check returns unknown or failed.
- When a health assessment cannot be completed successfully.
Notifications can be sent to a number of different supported destinations:
- Slack
- Microsoft Teams
- Webhooks (most flexible)
This feature is a workspace-level configuration, which means that you’ll need to configure it for every workspace that you need to monitor and get a notification. If you are using Terraform code to configure your HCP Terraform or Terraform Enterprise instance, you can use the flexibility of the Terraform language to easily configure notifications on workspaces that need it.