Terraform
- Terraform Enterprise
- 1.0.x (latest)
- v202507-1
- v202506-1
- v202505-1
- v202504-1
- v202503-1
- v202502-2
- v202502-1
- v202501-1
- v202411-2
- v202411-1
- v202410-1
- v202409-3
- v202409-2
- v202409-1
- No versions of this document exist before v202408-1. Click below to redirect to the version homepage.
- v202408-1
- v202407-1
- v202406-1
- v202405-1
- v202404-2
- v202402-2
- v202402-1
- v202401-2
- v202401-1
- v202312-1
- v202311-1
- v202310-1
- v202309-1
- v202308-1
- v202307-1
- v202306-1
- v202305-2
- v202305-1
- v202304-1
- v202303-1
- v202302-1
- v202301-2
- v202301-1
- v202212-2
- v202212-1
- v202211-1
- v202210-1
- v202209-2
- v202209-1
- v202208-3
- v202208-2
- v202208-1
- v202207-2
- v202207-1
- v202206-1
Troubleshooting
This guide describes how to troubleshoot Terraform Enterprise.
Health check
Terraform Enterprise provides a /_health_check endpoint on the instance. If
Terraform Enterprise is up, the health check will return a 200 OK.
The /_health_check endpoint operates in 2 modes:
- Full check
- Minimal check
With a full check, the service will attempt to verify the status of internal
components and PostgreSQL, in contrast to a minimal check which returns 200
OK automatically after a successful check.
The endpoint's default behavior is to perform a full check during startup of the instance, and minimal checks after Terraform Enterprise is active and running.
To force a full check, include the additional query parameter ?full=1. This
parameter causes every call to make requests to internal components and
PostgreSQL, increasing system load and latency. Use it sparingly.
Support bundles
A support bundle is a collection of logs and other information about your installation that you can then send to HashiCorp Support for further troubleshooting.
You can generate a support bundle using the tfectl support bundle command.
See the tfectl
documentation
for more information.
Support bundle contents
Support bundles contain the following information:
- Logs for all Terraform Enterprise services.
- License information for the installation.
- The Terraform Enterprise environment configuration with redacted secrets.
- Additional diagnostic information about the container, such as the contents
of /etc/hosts, disk and memory usage, and network configuration.
Logs are available within the bundle under the host directory. All other
information is available within the results.json file located at the root of
the bundle.
Terraform Enterprise services
When you encounter application issues, understanding the role of individual services in Terraform Enterprise will help you troubleshoot.
- archivist- Object storage API which simplifies the service architecture and minimizes inner-network cross talk by colocating the logical storage and front-end API handler pieces.
- atlas- The Terraform Enterprise API.
- atlas-ui- The Terraform Enterprise user interface.
- backup-restore- A tool that provides both an API to backup and restore a Terraform Enterprise backup snapshot and a command line tool to inspect a backup snapshot. A backup snapshot is an encrypted binary file containing the Archivist data, Vault transit keys, and PostgreSQL schema dumps for a given Terraform Enterprise instance.
- licensing- A library and service that provides enterprise license functionality for Terraform.
- metrics- A Terraform Enterprise component to aggregate metrics and expose them over HTTP and HTTPS.
- nginx- The NGINX reverse proxy which facilitates access to the Terraform Enterprise services.
- outbound-http-proxy- Security control used to filter user-controlled network traffic (e.g. sentinel imports) and prevent them from accessing internal services directly.
- postgres- The PostgreSQL database holds relational data such as workspace applies and where their state is stored in object storage. An internal PostgreSQL service is started when the operational mode is- diskon Docker runtime. PostgreSQL server host config must be provided for application on cloud-managed Kubernetes, or for- externaland- active-activeoperational mode on Docker runtime.
- redis- An in-memory database, use for caching and- sidekiqqueue. An internal Redis service is started when the operational mode is- diskor- external. Redis server host config must be provided for- active-activemode for application on cloud-managed Kubernetes, or for- externaland- active-activeoperational mode on Docker runtime.
- registry_api- Terraform Private Module Registry API.
- sidekiq- Background job scheduler system.
- slug-ingress- Listens for VCS webhooks. Packages VCS repo data as a slug and sends it to- archivist.
- task-worker- A service that manages asynchronous units of work in Terraform Enterprise.
- terraform-registry-api- The API to the Terraform Registry.
- terraform-registry-worker- Processes VCS slugs and prepares modules to be published on the Terraform private Module Registry.
- terraform-state-parser- Reads Terraform state files and parses important information out of them. Terraform state is consumed from a remote state URL, and compiled data is sent in the payload of a callback to Atlas.
- tfe-health-check- This tool is to help our customers and us know exactly why Terraform Enterprise has gotten into an unhealthy state, checking the health and connections to Postgres, Redis, Vault, storage, etc.
- vault- HashiCorp Vault utilizes transit encryption for items such as sensitive workspace variables.
Service status
To check the status of the services, execute the following command within Terraform Enterprise container.
$ supervisorctl status
You will receive output similar to the following.
$ supervisorctl status
logs                             RUNNING   pid 39, uptime 1:38:49
postgres                         RUNNING   pid 103, uptime 1:38:48
redis                            RUNNING   pid 77, uptime 1:38:49
tfe:archivist                    RUNNING   pid 199, uptime 1:38:46
tfe:atlas                        RUNNING   pid 200, uptime 1:38:46
tfe:atlas-ui                     RUNNING   pid 201, uptime 1:38:46
tfe:backup-restore               RUNNING   pid 203, uptime 1:38:46
tfe:licensing                    RUNNING   pid 205, uptime 1:38:46
tfe:metrics                      RUNNING   pid 211, uptime 1:38:46
tfe:nginx                        RUNNING   pid 215, uptime 1:38:46
tfe:outbound-http-proxy          RUNNING   pid 220, uptime 1:38:46
tfe:sidekiq                      RUNNING   pid 238, uptime 1:38:46
tfe:slug-ingress                 RUNNING   pid 248, uptime 1:38:46
tfe:task-worker                  RUNNING   pid 257, uptime 1:38:46
tfe:terraform-registry-api       RUNNING   pid 265, uptime 1:38:46
tfe:terraform-registry-worker    RUNNING   pid 280, uptime 1:38:46
tfe:terraform-state-parser       RUNNING   pid 291, uptime 1:38:46
tfe:tfe-health-check             RUNNING   pid 298, uptime 1:38:46
tfe:vault                        RUNNING   pid 309, uptime 1:38:46
tfe-next                         RUNNING   pid 40, uptime 1:38:49
Inspecting logs
To inspect the logs for a particular service, execute the following command
within the Terraform Enterprise container where SERVICE_NAME is the name of a
Terraform Enterprise service.
$ cat /var/log/terraform-enterprise/SERVICE_NAME.log
For example, we can see why tfe:licensing exited.
$ cat /var/log/terraform-enterprise/licensing.log
{"@level":"info","@message":"initializing database","@module":"tfe-licensing","@timestamp":"2023-05-10T20:46:26.379084Z"}
{"@level":"error","@message":"error opening database connection","@module":"tfe-licensing","@timestamp":"2023-05-10T20:46:26.399064Z","error":"failed to connect to `host=/var/run/postgresql user=terraform-enterprise database=`: server error (FATAL: role \"terraform-enterprise\" does not exist (SQLSTATE 28000))"}
Kubernetes
Debug mode
Terraform Enterprise dispatches plans and applies via jobs when running inside Kubernetes. These jobs are removed immediately after their execution, which can make it hard to understand if a job failed due to cluster-specific errors.
To make troubleshooting easier in these scenarios, it is possible to keep the kubernetes jobs alive for a limited period of time,
after which they will get garbage collected by the cluster. To enable this, provide the following environment variables to
the deployment, either via the env.variables entry in the values.yaml override, or via the ConfigMap attached to the
deployment holding all of the environment variables.
- TFE_RUN_PIPELINE_KUBERNETES_DEBUG_ENABLED. Boolean flag to enable debug mode, set to- true.
- TFE_RUN_PIPELINE_KUBERNETES_DEBUG_JOBS_TTL. (Optional) time in seconds after which the jobs will get deleted; default is- 86400.
Troubleshooting common application errors
This section covers common application errors that you may run into.
Kubernetes Fails to Pull Image
Symptom
Kubernetes pods are failing to pull the container image with a BackOff error.
Signals
kubectl describe pod is stuck in the Waiting state with the ErrImagePull
reason.
$ kubectl describe pod terraform-enterprise-7f649f6598-2k79b
...
Containers:
  terraform-enterprise:
    State:          Waiting
      Reason:       ErrImagePull
...
Solution
Update the image pull policy for the deployment to always.
Empty S3 static credentials
Symptom
Application fails to start.
Signals
Logs show the following S3 prefix detection error.
2023-05-10T23:38:18.100Z [ERROR] terraform-enterprise: startup: error="failed detecting s3 prefix: could not list objects: operation error S3: ListObjectsV2, failed to sign request: failed to retrieve credentials: failed to refresh cached credentials, static credentials are empty"
Solution
Set TFE_OBJECT_STORAGE_S3_USE_INSTANCE_PROFILE to true when using IAM auth
for S3.
Unknown certificate with VCS integration
Symptom
You cannot configure a VCS connection within Terraform Enterprise.
Signals
Setting up VCS fails with unknown certificate issuer error.
Solution
Include the CA certificate for your VCS server in the CA Bundle. Ensure the
TFE_TLS_CA_BUNDLE_FILE is set to a path pointing to your CA bundle.
Unknown certificate with failing Terraform runs
Symptom
Terraform plans and applies fail.
Signals
Logs for task worker and archivist show an x509 error.
Solution
Include the CA certificates for all hosts that Terraform must communicate with,
including your Terraform Enterprise server itself, in the CA Bundle. Ensure the
TFE_TLS_CA_BUNDLE_FILE is set to a path pointing to your CA bundle.
Unable to fetch Terraform binary
Symptom
Terraform plans and applies fail with failed downloading terraform.
Signals
Terraform run logs contain.
Operation failed: failed fetching Terraform: failed downloading terraform: failed downloading "https://releases.hashicorp.com/terraform/1.3.2/terraform_1.3.2_linux_amd64.zip": GET https://releases.hashicorp.com/terraform/1.3.2/terraform_1.3.2_linux_amd64.zip giving up after 5 attempt(s): failed making temp file: open /tmp/terraform/8c23e18ed1846a552fc22ed5ee80eec9.download-67d5219a-aa5c-cd41-3262-2b9d57c1bfe2: read-only file system
Solution
Ensure the TFE_DISK_CACHE_PATH location is properly backed by a writable
volume.