Documentation
Get started
What is Consul?
Consul operations
Service networking
Enterprise solutions
Runtimes and platforms
HCP Consul Dedicated
Plugins, integrations, & extensions
Reference docs
Glossary

Server resource requirements reference

This page provides a reference for the server resources Consul requires for operation.

Introduction

Since Consul servers run a consensus protocol to process all write operations and are contacted on nearly all read operations, server performance is critical for overall throughput and health of a Consul cluster. Servers are generally I/O bound for writes because the underlying Raft log store performs a sync to disk every time an entry is appended. Servers are generally CPU bound for reads since reads work from a fully in-memory data store that is optimized for concurrent access.

Hardware requirements

The minimum hardware requirements for Consul servers in production clusters are:

CPU	Memory	Disk Capacity	Disk IO	Disk Throughput	Avg Round-Trip-Time	99% Round-Trip-Time
8-16 core	32-64 GB RAM	200+ GB	7500+ IOPS	250+ MB/s	Lower than 50ms	Lower than 100ms

Hardware requirements increase based on the size of the Consul cluster, the number of client agents, and workload characteristics such as read-heavy and write-heavy usage. While Consul may run on smaller hardware configurations for testing and development purposes, we recommend these minimum specifications to ensure optimal performance and reliability in production environments and avoid having to upgrade hardware during a capacity outage.

The following table provides our recommended starting point for hardware instances on each major cloud infrastructure provider.

Provider	Instance/VM Types	Disk Volume Specs
AWS	`m5.2xlarge`, `m5.4xlarge`	200+GB `gp3`, 10000 IOPS, 250MB/s
Azure	`Standard_D8s_v3`, `Standard_D16s_v3`	2048GB* `Premium SSD`, 7500 IOPS, 200MB/s
GCP	`n2-standard-8`, `n2-standard-16`	1000GB* `pd-ssd`, 30000 IOPS, 480MB/s

Raft performance tuning defaults

The default values for Consul server performance parameters assume Consul runs on a server cluster of three AWS t2.micro instances. These thresholds were determined empirically using a leader instance that was under sufficient read, write, and network load to cause it to permanently be at zero CPU credits, forcing it to the baseline performance mode for that instance type. Real-world workloads typically have more bursts of activity, making this a conservative tuning strategy.

We chose these defaults so that you can run small production or development clusters with lower cost compute resources, at the expense of some performance in leader failure detection and leader election times.

The default settings are equivalent to the following Consul server agent configuration:

{
  "performance": {
    "raft_multiplier": 5
  }
}

In production, we recommend configuring the Consul server performance parameters to 1, the lowest possible Raft multiplier setting. This setting allows Consul servers to detect a failed leader and complete leader elections quicker than the default configuration, at the expense of higher resource consumption. We recommend this higher performance configuration for Consul clusters, in production unless there is a specific reason to use the slower default configuration. For example, workloads running on lower end hardware, on a high-latency network, or on instances that share workloads may not have access to additional resources or even require them to operate at the highest performance level.

The high performance settings for a Consul server agent are represented in the following configuration:

{
  "performance": {
    "raft_multiplier": 1
  }
}

This value must take into account the network latency between the servers and the read/write load on the servers.

The value of raft_multiplier is a scaling factor and directly affects the following parameters:

Param	Value
HeartbeatTimeout	1000ms	default
ElectionTimeout	1000ms	default
LeaderLeaseTimeout	500ms	default

By default, Consul uses a scaling factor of 5 (i.e. raft_multiplier: 5), which results in the following values:

Param	Value	Calculation
HeartbeatTimeout	5000ms	5 x 1000ms
ElectionTimeout	5000ms	5 x 1000ms
LeaderLeaseTimeout	2500ms	5 x 500ms

NOTE Wide networks with more latency will perform better with larger values of raft_multiplier.

The trade off is between leader stability and time to recover from an actual leader failure. A short multiplier minimizes failure detection and election time but may be triggered frequently in high latency situations. This can cause constant leadership churn and associated unavailability. A high multiplier reduces the chances that spurious failures will cause leadership churn but it does this at the expense of taking longer to detect real failures and thus takes longer to restore cluster availability.

Leadership instability can also be caused by under-provisioned CPU resources and is more likely in environments where CPU cycles are shared with other workloads. In order for a server to remain the leader, it must send frequent heartbeat messages to all other servers every few hundred milliseconds. If some number of these are missing or late due to the leader not having sufficient CPU to send them on time, the other servers will detect it as failed and hold a new election.

It's best to benchmark with a realistic workload when choosing a production server for Consul. Here are some general recommendations:

Consul will make use of multiple cores, and at least 2 cores are recommended.
Spurious leader elections can be caused by networking issues between the servers or insufficient CPU resources. Users in cloud environments often bump their servers up to the next instance class with improved networking and CPU until leader elections stabilize, and in Consul 0.7 or later the performance parameters configuration now gives you tools to trade off performance instead of upsizing servers. You can use the consul.raft.leader.lastContact telemetry to observe how the Raft timing is performing and guide the decision to de-tune Raft performance or add more powerful servers.
For DNS-heavy workloads, configuring all Consul agents in a cluster with the allow_stale configuration option will allow reads to scale across all Consul servers, not just the leader. Consul 0.7 and later enables stale reads for DNS by default. See Stale Reads in the DNS Caching guide for more details. It's also good to set reasonable, non-zero DNS TTL values if your clients will respect them.
In other applications that perform high volumes of reads against Consul, consider using the stale consistency mode available to allow reads to scale across all the servers and not just be forwarded to the leader.
In Consul 0.9.3 and later, a new limits configuration is available on Consul clients to limit the RPC request rate they are allowed to make against the Consul servers. After hitting the limit, requests will start to return rate limit errors until time has passed and more requests are allowed. Configuring this across the cluster can help with enforcing a max desired application load level on the servers, and can help mitigate abusive applications.

Memory Requirements

Consul server agents operate on a working set of data comprised of key/value entries, the service catalog, prepared queries, access control lists, and sessions in memory. These data are persisted through Raft to disk in the form of a snapshot and log of changes since the previous snapshot for durability.

When planning for memory requirements, you should typically allocate enough RAM for your server agents to contain between 2 to 4 times the working set size. You can determine the working set size by noting the value of consul.runtime.alloc_bytes in the Telemetry data.

NOTE: Consul is not designed to serve as a general purpose database, and you should keep this in mind when choosing what data are populated to the key/value store.

Read/Write Tuning

Consul is write limited by disk I/O and read limited by CPU. Memory requirements will be dependent on the total size of KV pairs stored and should be sized according to that data (as should the hard drive storage). The limit on a key's value size is 512KB.

Consul is write limited by disk I/O and read limited by CPU.

For write-heavy workloads, the total RAM available for overhead must approximately be equal to

RAM NEEDED = number of keys * average key size * 2-3x

Since writes must be synced to disk (persistent storage) on a quorum of servers before they are committed, deploying a disk with high write throughput (or an SSD) will enhance performance on the write side. (Documentation)

For a read-heavy workload, configure all Consul server agents with the allow_stale DNS option, or query the API with the stale consistency mode. By default, all queries made to the server are RPC forwarded to and serviced by the leader. By enabling stale reads, any server will respond to any query, thereby reducing overhead on the leader. Typically, the stale response is 100ms or less from consistent mode but it drastically improves performance and reduces latency under high load.

If the leader server is out of memory or the disk is full, the server eventually stops responding, loses its election and cannot move past its last commit time. However, by configuring max_stale and setting it to a large value, Consul will continue to respond to queries during such outage scenarios. (max_stale documentation).

It should be noted that stale is not appropriate for coordination where strong consistency is important (i.e. locking or application leader election). For critical cases, the optional consistent API query mode is required for true linearizability; the trade off is that this turns a read into a full quorum write so requires more resources and takes longer.

Read-heavy clusters may take advantage of the enhanced reading feature (Enterprise) for better scalability. This feature allows additional servers to be introduced as non-voters. Being a non-voter, the server will still participate in data replication, but it will not block the leader from committing log entries.

Consul's agents use network sockets for communicating with the other nodes (gossip) and with the server agent. In addition, file descriptors are also opened for watch handlers, health checks, and log files. For a write heavy cluster, the ulimit size must be increased from the default value (1024) to prevent the leader from running out of file descriptors.

To prevent any CPU spikes from a misconfigured client, RPC requests to the server should be rate limited.

NOTE Rate limiting is configured on the client agent only.

In addition, two performance indicators — consul.runtime.alloc_bytes and consul.runtime.heap_objects — can help diagnose if the current sizing is not adequately meeting the load.

Service Mesh Certificate Signing CPU Limits

If you enable service mesh, the leader server will need to perform public key signing operations for every service instance in the cluster. Typically these operations are fast on modern hardware, however when the CA is changed or its key rotated, the leader will face an influx of requests for new certificates for every service instance running.

While the client agents distribute these randomly over 30 seconds to avoid an immediate thundering herd, they don't have enough information to tune that period based on the number of certificates in use in the cluster so picking longer smearing results in artificially slow rotations for small clusters.

Smearing requests over 30s is sufficient to bring RPC load to a reasonable level in all but the very largest clusters, but the extra CPU load from cryptographic operations could impact the server's normal work. To limit that, Consul since 1.4.1 exposes two ways to limit the impact Certificate signing has on the leader csr_max_per_second and csr_max_concurrent.

By default we set a limit of 50 per second which is reasonable on modest hardware but may be too low and impact rotation times if more than 1500 service instances are using service mesh in the cluster. csr_max_per_second is likely best if you have fewer than four cores available since a whole core being used by signing is likely to impact the server stability if it's all or a large portion of the cores available. The downside is that you need to capacity plan: how many service instances will need service mesh certificates? What CSR rate can your server tolerate without impacting stability? How fast do you want CA rotations to process?

For larger production deployments, we generally recommend multiple CPU cores for servers to handle the normal workload. With four or more cores available, it's simpler to limit signing CPU impact with csr_max_concurrent rather than tune the rate limit. This effectively sets how many CPU cores can be monopolized by certificate signing work (although it doesn't pin that work to specific cores). In this case csr_max_per_second should be disabled (set to 0).

For example if you have an 8 core server, setting csr_max_concurrent to 1 would allow you to process CSRs as fast as a single core can (which is likely sufficient for the very large clusters), without consuming all available CPU cores and impacting normal server work or stability.