Detect infrastructure drift and enforce OPA policies

19min
|
Plus
Terraform

As your organization grows and your infrastructure provisioning workflows mature, it gets harder to enforce consistency and best practices with training and hand-built tooling alone. Terraform can automatically check that your infrastructure satisfies industry best practices and organization-specific standards, with resource and module-specific conditions, workspace-specific run tasks, and workspace or organization-wide policies, including Open Policy Agent (OPA).

Note

HCP Terraform Free Edition includes one policy set of up to five policies. In HCP Terraform Plus Edition, you can connect a policy set to a version control repository or create policy set versions via the API. Refer to HCP Terraform pricing for details.

In this tutorial, you will use both Terraform preconditions and HCP Terraform native OPA support to validate configuration and enforce compliance with organizational practices. First, you will use Terraform preconditions to enforce network security conventions. Then, you will learn how to configure and enforce OPA policies in HCP Terraform, preventing infrastructure deployments on certain days of the week. Finally, you will use HCP Terraform's drift detection to detect when infrastructure settings have diverged from your written Terraform configuration.

Pre- and post-conditions help you define resource requirements in Terraform configurations. By including custom conditions in module definitions, you can ensure that downstream consumers comply with configuration standards, and use modules properly.

Prerequisites

This tutorial assumes that you are familiar with the Terraform and HCP Terraform workflows. If you are new to Terraform, complete the Get Started tutorials first. If you are new to HCP Terraform, complete the HCP Terraform Get Started tutorials first.

In order to complete this tutorial, you will need the following:

Terraform v1.4+ installed locally.
An AWS account.
An HCP Terraform account with HCP Terraform locally authenticated.
An HCP Terraform variable set configured with your AWS credentials.

Create example repository

Visit the template repository for this tutorial. Click the Use this template button and select Create a New Repository. Choose the GitHub owner that you use with HCP Terraform, and name the new repository learn-terraform-drift-and-opa. Leave the rest of the settings at their default values.

Clone example configuration

Clone your example repository, replacing USER with your own GitHub username. You will push to this fork later in the tutorial.

$ git clone https://github.com/USER/learn-terraform-drift-and-opa.git

Change to the repository directory.

$ cd learn-terraform-drift-and-opa

Review infrastructure configuration

This repository contains a local Terraform module that defines a network and bastion host, and a root configuration that uses the module. It also contains OPA policy definitions, which you will review later in this tutorial.

$ tree
.
├── README.md
├── main.tf
├── modules
│   └── network
│       ├── main.tf
│       ├── outputs.tf
│       └── variables.tf
├── opa
│   ├── policies
│   │   ├── friday_deploys.rego
│   │   └── public_ingress.rego
│   └── policies.hcl
├── terraform.auto.tfvars
├── terraform.tf
└── variables.tf

Open the modules/network/main.tf file in your code editor. This configuration uses the public vpc module to provision networking resources, including public and private subnets and a NAT gateway. It then launches a bastion host in one of the public subnets.

##...
resource "aws_security_group" "bastion" {
  name   = "bastion_ssh"
  vpc_id = module.vpc_id

  ingress {
    from_port = 22
    to_port   = 22
    protocol  = "tcp"
    cidr_blocks = ["192.80.0.0/16"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

data "aws_ami" "amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }
}

resource "aws_instance" "bastion" {
  instance_type = var.bastion_instance_type
  ami           = data.aws_ami.amazon_linux.id

  subnet_id              = module.vpc.public_subnets[0]
  vpc_security_group_ids = [aws_security_group.bastion.id]
}

The bastion host is intended to be the single point of entry for any SSH traffic to instances within the VPC’s private subnets. The configuration also includes a security group that scopes any ingress SSH traffic to the bastion to just the 192.80.0.0/16 CIDR block, a hypothetical CIDR representing your organization’s network.

Though this configuration references this module locally, in a larger organization, you would likely publish it in your Terraform registry. By including a bastion in the boilerplate of your networking configuration, you can establish a standard for SSH access to instances in your networks.

Define a precondition

The network module defines a bastion_instance_type input variable to allow users to account for anticipated usage and workloads. While you want to allow users to specify an instance type, you do not want to allow them to provision an instance that is too big. You will add a precondition to verify that the instance type does not have more than 2 cores, to keep your operating costs low.

First, add the data source below to the module configuration. It accesses the instance type details, including the number of cores, from the AWS provider.

data "aws_ec2_instance_type" "bastion" {
  instance_type = var.bastion_instance_type
}

Now, add the precondition to the aws_instance.bastion resource definition.

resource "aws_instance" "bastion" {
  instance_type = var.bastion_instance_type
  ami           = data.aws_ami.amazon_linux.id

  subnet_id              = module.vpc.public_subnets[0]
  vpc_security_group_ids = [aws_security_group.bastion.id]

  lifecycle {
    precondition {
      condition     = data.aws_ec2_instance_type.bastion.default_cores <= 2
      error_message = "Change the value of bastion_instance_type to a type that has 2 or fewer cores to avoid over provisioning."
    }
  }
}

Terraform evaluates preconditions before provisioning the surrounding block. In this case, it will check whether the configuration satisfies the condition before provisioning the bastion instance.

Deploy infrastructure

Navigate back to the root level of the repository directory. The root Terraform configuration uses the network module to create a bastion host and networking components including a VPC, subnets, a NAT gateway, and route tables.

It sets the values for input variables in the terraform.auto.tfvars file. The initial value for the bastion instance type is t2.2xlarge, which has 8 cores and will fail the precondition as expected.

bastion_instance_type = "t2.2xlarge"
aws_region            = "us-east-2"

Set your HCP Terraform organization name as an environment variable to configure your HCP Terraform integration.

$ export TF_CLOUD_ORGANIZATION=

Tip

If multiple users in your HCP Terraform organization will run this tutorial, add a unique suffix to the workspace name in terraform.tf.

Initialize your configuration. As part of initialization, Terraform creates your learn-terraform-drift-and-opa HCP Terraform workspace.

$ terraform init
Initializing modules...
- network in modules/network
Downloading registry.terraform.io/terraform-aws-modules/vpc/aws 3.14.0 for network.vpc...
- network.vpc in .terraform/modules/network.vpc

Initializing HCP Terraform...

Initializing provider plugins...
- Reusing previous version of hashicorp/aws from the dependency lock file
- Installing hashicorp/aws v4.10.0...
- Installed hashicorp/aws v4.10.0 (signed by HashiCorp)

HCP Terraform has been successfully initialized!

You may now begin working with HCP Terraform. Try running "terraform plan" to
see any changes that are required for your infrastructure.

If you ever set or change modules or Terraform Settings, run "terraform init"
again to reinitialize your working directory.

Now, attempt to apply your configuration. The apply will fail because the instance size you specified is too big, and the precondition will return an error.

Note

This tutorial assumes that you are using a tutorial-specific HCP Terraform organization with a global variable set of your AWS credentials. Review the Create a Credential Variable Set for detailed guidance. If you are using a scoped variable set, assign it to your new workspace now.

$ terraform apply
Running apply in HCP Terraform. Output will stream here. Pressing Ctrl-C
will cancel the remote apply if it's still pending. If the apply started it
will stop streaming the logs, but will not stop the apply running remotely.

Preparing the remote apply...

To view this run in a browser, visit:
https://app.terraform.io/app/hashicorp-training/learn-terraform-opa/runs/run-nduWAtk2P6HjYQmd

Waiting for the plan to start...

Terraform v1.4.0
on linux_amd64
Initializing plugins and modules...
module.network.data.aws_ec2_instance_type.bastion: Reading...
module.network.data.aws_ami.amazon_linux: Reading...
module.network.data.aws_availability_zones.available: Reading...
module.network.data.aws_ec2_instance_type.bastion: Read complete after 0s [id=t2.2xlarge]
module.network.data.aws_availability_zones.available: Read complete after 0s [id=us-east-2]
module.network.data.aws_ami.amazon_linux: Read complete after 0s [id=ami-058d017bb0407da05]
╷
│ Error: Resource precondition failed
│
│   on modules/network/main.tf line 66, in resource "aws_instance" "bastion":
│   66:       condition     = data.aws_ec2_instance_type.bastion.default_cores <= 2
│     ├────────────────
│     │ data.aws_ec2_instance_type.bastion.default_cores is 8
│
│ Change the value of bastion_instance_type to a type that has fewer than 2
│ cores to avoid over provisioning.
╵
Operation failed: failed running terraform plan (exit 1)

The t2.2xlarge instance type has 8 cores, so this Terraform run failed the precondition defined in the networking module. Overprovisioning the bastion would incur unnecessary cost for your organization.

Change the bastion_instance_type variable in terraform.auto.tfvars to t2.small.

bastion_instance_type = "t2.small"
aws_region            = "us-east-2"

Apply your configuration again. Respond yes to the prompt to confirm the operation.

$ terraform apply
Running apply in HCP Terraform. Output will stream here. Pressing Ctrl-C
will cancel the remote apply if it's still pending. If the apply started it
will stop streaming the logs, but will not stop the apply running remotely.

Preparing the remote apply...

To view this run in a browser, visit:
https://app.terraform.io/app/hashicorp-training/learn-terraform-opa/runs/run-kfkxYFvXtiEConPK

Waiting for the plan to start...

Terraform v1.4.0
on linux_amd64
Initializing plugins and modules...
module.network.data.aws_ec2_instance_type.bastion: Reading...
module.network.data.aws_ami.amazon_linux: Reading...
module.network.data.aws_availability_zones.available: Reading...
module.network.data.aws_ec2_instance_type.bastion: Read complete after 0s [id=t2.small]
module.network.data.aws_availability_zones.available: Read complete after 0s [id=us-east-2]
module.network.data.aws_ami.amazon_linux: Read complete after 0s [id=ami-058d017bb0407da05]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:
Plan: 22 to add, 0 to change, 0 to destroy.


Do you want to perform these actions in workspace "learn-terraform-opa"?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

##...
Apply complete! Resources: 22 added, 0 changed, 0 destroyed.

Using a precondition to verify resource allocation lets you use the most up to date information from AWS to determine whether or not your configuration satisfies the requirement. While you could have also used variable validation to catch the violation, that would require researching all of the instance types and their capacities and listing all of the acceptable instance types in your configuration, making it less flexible.

Review OPA policy

Configuration-level validation such as variable constraints and preconditions let you socialize standards from within your written configuration. However, module authors and users must voluntarily comply with the standards. Module authors must include conditions in module definitions, and users must consume those modules to provision infrastructure. To enforce infrastructure standards across entire workspaces or organizations, you can use OPA policies, which work without requiring your users to write their infrastructure configuration in a specific way.

Navigate to the opa directory in the example repository.

$ cd opa

Open the policies.hcl file to review the policy set configuration.

policy "friday_deploys" {
  query = "data.terraform.policies.friday_deploys.deny"
  enforcement_level = "mandatory"
}

policy "public_ingress" {
  query = "data.terraform.policies.public_ingress.deny"
  enforcement_level = "mandatory"
}

This policy set defines two policies, friday_deploys and public_ingress. It sets the enforcement level to mandatory on both, which prevents infrastructure provisioning in the event of policy failure. HCP Terraform policies also support an advisory enforcement level, which notifies users of failures but allows them to provision resources anyway. The query format references the package name declared in the policy file, and the name of the rule defined for the policy.

Change to the policies directory to review the policy definitions.

$ cd policies

The public_ingress policy parses the planned changes for a Terraform run and checks whether they include security group updates to allow public ingress traffic from all CIDRs (0.0.0.0/0). This policy helps enforce your security posture by preventing the creation of any overly permissive security groups.

package terraform.policies.public_ingress

import input.plan as tfplan

deny[msg] {
  r := tfplan.resource_changes[_]
  r.type == "aws_security_group"
  r.change.after.ingress[_].cidr_blocks[_] == "0.0.0.0/0"
  msg := sprintf("%v has 0.0.0.0/0 as allowed ingress", [r.address])
}

In addition to placing guardrails on infrastructure configuration, you may wish to enforce standards around your organization’s workflows themselves. One common practice is to prevent infrastructure deployments on Fridays in order to lower the risk of production incidents before the weekend. The friday_deploys policy prevents infrastructure deployments on a certain day of the week.

In the friday_deploys.rego file, replace DAY with the current day of the week (e.g., Tuesday) to test that the policy blocks deploys today.

package terraform.policies.friday_deploys

deny[msg] {
  time.weekday(time.now_ns()) == "DAY"

  msg := "No deployments allowed on Fridays"
}

Stage your update to the policy.

$ git add .

Commit the change.

$ git commit -m "Update OPA policy"

Then, push your change.

$ git push

Create a policy set

HCP Terraform organizes policies in policy sets. Policy sets can contain either Sentinel or OPA policies. You can apply a policy set across an organization, or only to specific workspaces.

There are three ways to manage policy sets and their policies: VCS repositories, the HCP Terraform API, or directly through the HCP Terraform UI. In this tutorial, you will configure policy sets through VCS. The VCS workflow lets you collaborate on and safely develop and version your OPA policies, establishing the repository as the source of truth.

Navigate to your organization's Settings, then to Policy Sets. Click Connect a new policy set.

Select your Github version control integration.

Tip

Review the cloud VCS tutorial for detailed guidance on how to configure your VCS integration.

Select your fork of the learn-terraform-drift-and-opa repository.

On the Settings page:

Select OPA as the policy integration.
Under Policy set source, expand the More options drop down.
Set the Policies Path to opa.
Set the Scope of Policies to Policies enforced on selected workspaces
Uncheck the box to allow overrides of failures Under Workspaces, select your learn-terraform-opa workspace. Then, click Add workspace.

Tip

You can pin a policy set to a specific runtime version using the Runtime version drop-down. Policy runtime version management is currently in beta.

Finally, click Connect policy set

Configured HCP Terraform OPA policy set

Trigger policy violation

The networking resources you provisioned earlier include a bastion host configured with a security group that restricts ingress traffic to your organization’s internal network. Imagine that an engineer is troubleshooting a production incident and tries to get around this restriction by making the security group more permissive.

To simulate this, update the ingress rule for the aws_security_group.bastion resource in modules/network/main.tf.

resource "aws_security_group" "bastion" {
  name   = "bastion_ssh"
  vpc_id = module.vpc.vpc_id

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Navigate back to the root repository directory.

$ cd ../..

Run terraform apply to attempt to update the security group.

$ terraform apply
Running apply in HCP Terraform. Output will stream here. Pressing Ctrl-C
will cancel the remote apply if it's still pending. If the apply started it
will stop streaming the logs, but will not stop the apply running remotely.

Preparing the remote apply...

To view this run in a browser, visit:
https://app.terraform.io/app/hashicorp-training/learn-terraform-drift-and-opa/runs/run-j3fNsPw1RvwPfJQ9

Waiting for the plan to start...

Terraform v1.4.0
on linux_amd64
Initializing plugins and modules...
##...
Post-plan Tasks:

OPA Policy Evaluation

→→ Overall Result: FAILED
 This result means that one or more OPA policies failed. More than likely, this was due to the discovery of violations by the main rule and other sub rules
2 policies evaluated

→ Policy set 1: learn-terraform-drift-and-opa-template (2)
  ↳ Policy name: friday_deploys
     | × Failed
     | No description available
  ↳ Policy name: public_ingress
     | × Failed
     | No description available
╷
│ Error: Task Stage failed.

HCP Terraform detected the policy failures: the security group allows public ingress, and deploys are blocked today. The CLI output and run details in HCP Terraform list which policies failed.

OPA policy failure in HCP Terraform workspace

Using OPA policies in HCP Terraform, you prevented Terraform from creating resources that violate your infrastructure and organization standards.

Before moving on, fix your policy and configuration to allow a successful apply.

First, update the friday_deploys policy to check for deployments on Fridays. (If today is Friday, pick another day.)

package terrafrom.policies.friday_deploys

deny[msg] {
  time.weekday(time.now_ns()) == "Friday"

  msg := "No deployments allowed today."
}

Stage your update to the policy.

$ git add .

Commit the change.

$ git commit -m "Update OPA policy"

Then, push your change.

$ git push

Revert the change to your for the aws_security_group.bastion resource in modules/network/main.tf so that it reflects your actual infrastructure configuration.

resource "aws_security_group" "bastion" {
  name   = "bastion_ssh"
  vpc_id = module.vpc.vpc_id

  ingress {
    from_port   = 22
    to_port     = 22
    protocol    = "tcp"
    cidr_blocks = ["192.80.0.0/16"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Reapply your configuration to bring your workspace back into a healthy state.

$ terraform apply
##...
No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no
changes are needed.

Post-plan Tasks:

OPA Policy Evaluation

→→ Overall Result: PASSED
 This result means that all OPA policies passed and the protected behavior is allowed
2 policies evaluated

→ Policy set 1: learn-terraform-drift-and-opa-template (2)
  ↳ Policy name: friday_deploys
     | ✓ Passed
     | No description available
  ↳ Policy name: public_ingress
     | ✓ Passed
     | No description available

Introduce infrastructure drift

Note

Drift detection is available in HCP Terraform Plus Edition. Skip to the clean up step if you do not have access, or refer to HCP Terraform pricing for details.

Custom conditions, input validation, and policy enforcement help organizations maintain their standards at the time of resource provisioning. HCP Terraform can also check whether existing resources in Terraform state still match the intended configuration.

Returning to the hypothetical production incident, imagine that an engineer tries to work around the policy by making manual resource changes while troubleshooting.

To simulate this, navigate to your security groups in the AWS console.

Find the bastion_ssh security group. Select the Inbound rules tab in the security group details, then click Edit inbound rules.

Edit inbound security group rule

Delete the 192.168.0.0/16 source CIDR and replace it with 0.0.0.0/0. Then, click Save rules.

Update security group source CIDR

You have now introduced infrastructure drift into your configuration by managing the security group resource outside of the Terraform workflow.

Detect drift

HCP Terraform’s automatic health assessments help make sure that existing resources match their Terraform configuration. To do so, HCP Terraform runs non-actionable, refresh-only plans in configured workspaces to compare the actual settings of your infrastructure against the resources tracked in your workspace’s state file. The assessments do not update your state or infrastructure configuration.

Assessments include two types of checks, which you enable together. Drift detection determines whether resources have changed outside of the Terraform workflow. Health checks verify that any custom conditions you define in your configuration are still valid, for example checking if a certificate is still valid. You can enable assessments on specific workspaces, or across all workspaces in an organization. Assessments only run on workspaces where the last apply was successful. If the last apply failed, the workspace already needs operator attention. Make sure your last apply succeeded before moving on.

Navigate to your learn-terraform-drift-and-opa workspace in the HCP Terraform UI. Under the workspace's Settings, select Health.

Select Enable, then click Save settings.

Enable health assessments on TFC workspace

Shortly after enabling health assessments, the first assessment runs on the workspace. After the first assessment, following assessments run once every 24 hours.

After a few minutes, Terraform will report failed assessments on the workspace overview page.

HCP Terraform drift detection

Click View Details to get more information. HCP Terraform detected the change to your ingress rule and reported what will happen on your next run if you do not update your configuration.

HCP Terraform drift detection

Note

Drift detection only reports on changes to the resource attributes defined in your configuration. To avoid accidental drift, explicitly set any attributes critical to your operations in your configuration, even if you rely on a provider's default value for that attribute.

The health assessments detected infrastructure drift. These checks ensure that your infrastructure configuration still matches the written configuration and satisfies any defined custom conditions, extending your validation coverage beyond just the time of provisioning. Fixing drift is a manual process, because you need to understand whether you want to keep the infrastructure changes made outside of Terraform, or overwrite them. In this case, you could run another Terraform apply to overwrite the security group update.

Clean up infrastructure

Destroy the resources you created as part of this tutorial to avoid incurring unnecessary costs. Respond yes to the prompt to confirm the operation.

$ terraform destroy
##...

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:
##...
Plan: 0 to add, 0 to change, 22 to destroy.

Post-plan Tasks:

OPA Policy Evaluation

→→ Overall Result: PASSED
 This result means that all OPA policies passed and the protected behavior is allowed
2 policies evaluated

→ Policy set 1: learn-terraform-drift-and-opa-template (2)
  ↳ Policy name: friday_deploys
     | ✓ Passed
     | No description available
  ↳ Policy name: public_ingress
     | ✓ Passed
     | No description available

Do you really want to destroy all resources in workspace "learn-terraform-drift-and-opa"?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes
##...
Apply complete! Resources: 0 added, 0 changed, 22 destroyed.

Optionally, delete your learn-terraform-drift-and-opa workspace and OPA policy set from your HCP Terraform organization.

Next steps

In this tutorial, you used Terraform language features and HCP Terraform policies to make sure that your infrastructure matches your configuration, and complies with your organization’s needs and standards. Configuration-level validation such as preconditions let you specify standards within Terraform configurations. HCP Terraform policies let you enforce standards for an entire workspace or organization. You also used HCP Terraform health assessments to make sure that existing infrastructure still matched Terraform configuration, and had not changed outside of the Terraform workflow.

To learn more about how Terraform features can help you validate your infrastructure configuration, check out the following resources:

Review the OPA and policy documentation.
Learn how to configure and use health assessments to detect infrastructure drift.
Learn how to manage your infrastructure costs in HCP Terraform.
Learn how to use HCP Terraform run tasks and HCP Packer to ensure machine image compliance.
Review the health assessment documentation.

OPA policies

No-code modules

This tutorial also appears in:

7 tutorials

Enforce Policy with Sentinel
Enforce policies before your users create infrastructure using Sentinel policy-as-code. Write, test, and implement Sentinel policies.
- Terraform