Operational Excellence Implementation Resources
The operational excellence pillar recommends strategies to enable your organization to build products quickly and efficiently; including shipping changes, updates, and upgrades. These strategies will help teams in your organization to collaborate with each other without delays or friction, even in failure scenarios. They include recommendations for both team and infrastructure architecture.
To implement our operational excellence recommendations, select a best practice and resource type below.
HashiCorp's operational excellence best practices expects that Vault and Consul are deployed in one of the following recommended configurations.
Vault infrastructure recommendations
This guide describes recommended best practices for infrastructure architects and operators to follow when deploying Vault using the Consul storage backend in a production environment.
In this tutorial, you will architect your Vault clusters according to HashiCorp recommended patterns and practices for replicating data.
This guide describes recommended best practices for infrastructure architects and operators to follow when deploying Vault using the Integrated Storage (Raft) storage backend in a production environment.
Consul infrastructure recommendations
Consul provides a central control plane that helps you discover, connect, and track services' state, enabling multi-cloud zero trust networking for your dynamic environments. Therefore, it is crucial to design, deploy, and monitor the Consul infrastructure correctly to guarantee operational excellence.
The Consul reference architecture and the Consul multi-cluster reference architecture describe recommended best practices for infrastructure architects and operators to follow when deploying Consul in a production environment.
If you have Kubernetes (K8s) workloads, you should review the K8s reference architecture and the K8s deployment tutorial. Furthermore, the Secure Consul on K8s tutorial will help you keep Consul resilient and secure. Lastly, the Observability tutorial shows you how to visualize workload metrics.
Disaster recovery is critical for highly-available applications, and organizations should plan and prepare for it in advance. The disaster recovery considerations tutorial, disaster recovery on K8s tutorial, and the disaster recovery for multi-cluster deployments tutorial all shed more light on preparing for worst-case scenarios.
Proactive Consul monitoring can help lower the risk of disaster recovery incidents. The Monitor Consul health tutorial explains key telemetry metrics to track, and the Monitor Consul with Telegraf tutorial guides you through setting up Telegraf in Consul.
Packer infrastructure recommendations
Packer complements Terraform, among other tools, by providing immutable images such as AMIs or containers that Terraform can ingest to create infrastructure. Packer configures the images by running one or more provisioners, letting users create ready-to-use artifacts. Provisioners use builtin and third-party software to install and configure the machine images. These images let teams run immutable infrastructure by removing the need to install components at run-time, allowing applications to scale rapidly in response to an increase in demand. It is important to configure the immutable images correctly and consistently and that the images are reliably available for Terraform.
HCP Packer stores metadata about your Packer images so that you can track updates, use the most up-to-date base images, and deploy the most up-to-date downstream images. This bridges the gap between image factories and image deployments, allowing development and security teams to work together to create, manage, and consume images in a centralized way.
We offer hands-on tutorials for practitioners, from creating your first Packer image, to building a golden image pipeline using Hashicorp’s HCP Packer registry. You can also learn how to prevent access to outdated machine images and enforce image compliance using Terraform and HCP Packer.
Architect and automate infrastructure
Terraform Cloud supports automated deployments based on changes to version control through features such as its integration with GitHub Actions.
The TFE Terraform provider can codify your Terraform Cloud workspaces, teams and processes.
Automate monitoring with the Terraform Datadog provider tutorial to deploy an application to a Kubernetes cluster and install the Datadog agent across the cluster. The Datadog agent reports the cluster health back to your Datadog dashboard.
Use application load balancers for blue-green and canary deployments. Provision blue and green environments, add feature toggles to your Terraform configuration to define a list of potential deployment strategies, conduct a canary test, and incrementally promote your green environment.
Understand the core Terraform workflow, how it evolves when a team is collaborating on infrastructure, and how Terraform Cloud enables this workflow to run smoothly for entire organizations.
This curated list of HashiCorp learning resources to help practitioners and organizations better understand the cloud operating model