Capacity planning is often overlooked when organizations architect and deploy solutions. You can better allocate hardware resources for Consul when you have a good understanding of what it does. This article will guide you through several considerations you should keep in mind when deploying and maintaining a Consul cluster.
It is important to select the correct size for your server instances. Consul server environments have a standard set of minimum requirements. However, these requirements may vary depending on what you are using Consul for.
Insufficient resource allocations or networking issues often cause general degraded performance. Eventually, the Consul leader node will not be able to respond to requests in sufficient time. When there is no leader response, the Consul cluster will trigger a re-election, pausing all requests and updates until the election ends.
The minimum hardware requirements for Consul servers in production clusters as recommended by the reference architecture are:
|CPU||Memory||Disk Capacity||Disk IO||Disk Throughput||Avg Round-Trip-Time||99% Round-Trip-Time|
|8-16 core||32-64 GB RAM||200+ GB||7500+ IOPS||250+ MB/s||Lower than 50ms||Lower than 100ms|
We recommend starting from the following instances (or similar) and scaling up as needed. We also recommend avoiding "burstable" CPU and storage options where performance may drop after a consistent load.
|Provider||Size||Instance/VM Types||Disk Volume Specs|
For HashiCorp Cloud (HCP) Consul, cluster size is measured in the number of service instances supported. Find out more information in the HCP Consul Pricing page.
Workloads are any actions that interact with the Consul cluster. These actions consist of key/value reads and writes, service registrations and deregistrations, adding or removing Consul client agents, and more.
Input/output operations per second (IOPS) is a unit of measurement for the amount of reads and writes to non-adjacent storage locations. For high workloads, ensure that the Consul server disks support a high number of IOPS to keep up with the rapid Raft log update rate. For virtual instances in cloud environments, unlike bare-metal environments, IOPS is often tied to storage sizing - more storage GBs will grant you more IOPS. Therefore, we recommend deploying on IOPS-optimized instances.
Consul server agents are generally I/O bound for writes and CPU bound for reads. For additional tuning, refer to the raft tuning section.
When planning for memory requirements, you should allocate RAM for server agents to contain 2 to 4 times the working set size. You can determine the working set size of a running cluster by noting the value of
consul.runtime.alloc_bytes in the leader node’s telemetry data. Inspect your monitoring solution for the telemetry value, or run the following commands with the jq tool installed on your Consul leader instance.
Tip: For Kubernetes, execute the command from the leader pod -
jq is available in the Consul server containers.
First, export your ACLs token.
$ export CONSUL_HTTP_TOKEN=
Then, retrieve the working set size.
$ curl --silent --header "X-Consul-Token: $CONSUL_HTTP_TOKEN" http://127.0.0.1:8500/v1/agent/metrics | jq '.Gauges | select(.Name=="consul.runtime.alloc_bytes") | .Value'` 616017920
For Kubernetes deployments, when setting up persistent volumes (PV) resources, you should define the correct server storage class parameter since the default ones are likely insufficient in performance. Refer to the Kubernetes documentation on storageClasses about how to set the storageClass Helm chart parameter and the specifics of each cloud provider.
|Workload type||Workload element examples||Instance Recommendations||Enterprise Feature Recommendations|
|Write heavy||Consul agent joins and leaves, services registration and deregistration, key/value writes||IOPS performance of ||Network Segments|
|Read heavy||Raft RPCs calls, DNS queries, key/value retrieval||Instances of type ||Read Replicas|
We recommend completing the Monitor Consul Datacenter Health tutorial as a starting point for Consul metrics and telemetry. The following tutorials will guide you through setting up specific monitoring solutions for your Consul cluster.
- Monitor Consul Datacenter Health with Telegraf
- Observability with Prometheus, Grafana, and Kubernetes
Monitoring is critical for making sure that your Consul data center performs correctly. A proactive monitoring strategy is beneficial in spotting problems within the infrastructure before any impact has happened.
A good place to start with monitoring is to create baselines for your Consul cluster's metrics. After you discover the baselines, you will be able to define alerts so it can notify you when there are unexpected values. For a detailed explanation of retrieving the metrics and inspecting their values, check out the Monitor Consul with Telegraf tutorial.
These metrics indicate how long it takes to complete write operations in various parts of the Consul cluster.
- consul.kvs.apply measures the time it takes to complete an update to the KV store.
- consul.txn.apply measures the time spent applying a transaction operation.
- consul.raft.apply counts the number of Raft transactions applied during the measurement interval. This metric is only reported on the leader.
- consul.raft.commitTime measures the time it takes to commit a new entry to the Raft log on disk on the leader.
These performance indicators can help you diagnose if the current instance sizing is unable to handle the workload.
- consul.runtime.alloc_bytes measures the number of bytes allocated by the Consul process.
- consul.runtime.sys_bytes measures the total number of bytes of memory obtained from the OS.
- consul.runtime.heap_objects measures the number of objects allocated on the heap and is a general memory pressure indicator.
Leadership changes are not a cause for concern but can often be a symptom of a problem. Frequent elections or leadership changes may indicate network issues between the Consul servers, or the Consul servers are unable to keep up with the load.
- consul.raft.leader.lastContact measures the time since the leader was last able to contact the follower nodes when checking its leader lease.
- consul.raft.state.candidate increments whenever a Consul server starts an election.
- consul.raft.state.leader increments whenever a Consul server becomes a leader.
- consul.server.isLeader tracks whether a server is a leader.
Network activity and RPC count measurements indicate the current load created from a Consul agent, including when the load becomes high enough to be rate limited. If an unusually high RPC count occurs, you should investigate before it overloads the cluster.
- consul.client.rpc increments whenever a Consul agent in client mode makes an RPC request to a Consul server.
- consul.client.rpc.exceeded increments whenever a Consul agent in client mode makes an RPC request to a Consul server gets rate limited by that agent's limits configuration.
- consul.client.rpc.failed increments whenever a Consul agent in client mode makes an RPC request to a Consul server and fails.
The recommended maximum size for a single datacenter is 5,000 Consul client agents. This recommendation is based on a standard, non-tuned environment and considers a blast radius's risk management factor. The maximum number of agents may be lower, depending on how you use Consul (for example, write-heavy and/or read-heavy datacenter).
If you require more than 5,000 client agents, you should:
- Break up the single Consul datacenter into multiple smaller datacenters. If the nodes are spread across separate physical locations (e.g. across different regions), it will be easy to model your multiple datacenter structures based on physical locations.
- Add network segments if every segment has low latency between clients and servers (e.g. within the same availability zone/region).
- Requests will allocate the required resources for your Consul workloads.
- Defining limits will prevent your pods from being terminated and restarted if they consume more resources than requested and Kubernetes needs to reclaim these resources. This will prevent outage situations where the Consul leader container gets terminated and redeployed due to resource constraints.
The following is an example Helm configuration that allocates 16 CPU cores and 64 Gigabytes of memory:
global: image: "hashicorp/consul" ## ... resources: requests: memory: '64G' cpu: '16000m' limits: memory: '64G' cpu: '16000m'
Consul uses the Raft consensus algorithm to provide consistency (as defined by CAP).
You may need to adjust Raft to suit your specific environment by tweaking the
raft_multiplier configuration attribute. The
raft_multiplier multiplication factor defines the trade-off between leader stability and time to recover from a leader failure.
- A short multiplier minimizes failure detection and election time but may be triggered frequently in high latency situations.
- A high multiplier reduces the chances that spurious failures will cause leadership churn but it does this at the expense of taking longer to detect real failures and thus takes longer to restore cluster availability.
The value of
raft_multiplier (by default 5) is a scaling factor setting and directly affects the following parameters:
|Parameter name||Default value||Derived from|
|HeartbeatTimeout||5000ms||5 x 1000ms|
|ElectionTimeout||5000ms||5 x 1000ms|
|LeaderLeaseTimeout||2500ms||5 x 500ms|
You can use the
consul.raft.leader.lastContact telemetry to observe how the Raft timing is performing. Wide networks with more latency will perform better with larger values of
raft_multiplier, however cluster failure detection will take longer. Therefore, we do not recommend setting the Raft multiplier higher than 5 (Raft down-tuning) to accommodate for slow network communication. Instead, replace the servers with more powerful ones, or minimize the network latency between nodes.
We recommend starting from a baseline perspective and performing chaos engineering testing with different values for the Raft multiplier to find the acceptable time for problem detection and recovery for the cluster. Then, you should scale the cluster and its dedicated resources with the number of workloads handled. This approach gives practitioners the best balance between pure resource growth and pure Raft tuning-focused strategies since it lets you use Raft tuning as a backup plan if you cannot scale your resources.
The types of workloads the Consul cluster handles also play an important role in Raft tuning. For example, suppose your Consul clusters are mostly static and do not handle many events. Then, it would help if you increase your Raft multiplier instead of scaling your resources because the risk of an important event happening while the cluster is converging or re-electing a leader is lower.
On the other hand, there are environments where it may be favorable for Consul to declare its leader unhealthy after only a short amount of time being unresponsive. In a well-architected solution, fast failure detection should be beneficial and should trigger either a high-availability switch over to a redundant cluster, or a response from a solution from the higher level of the platform stack.
In this article, you learned how to select an appropriate server requirements for your Consul cluster, suitable for how you will use Consul in your environment. Next, you learned which metrics to monitor to detect irregularities and signs of overloading. Then, you learned how to tune the Raft algorithm to adjust leader stability and time to recover.
In addition, you learned how to enforce soft limits to stop over-utilization before the current hardware fails. Finally, you learned how to perform workarounds regarding performance and capacity when hardware scaling up is no longer possible.
For even more information, please check out the Additional Reading section.
- Consul Reference Architecture
- Consul Kubernetes Reference Architecture
- Consul Server Performance
- Consul Agent Telemetry
For more insight on handling capacity-related operations with Consul, please check out the following articles: