Documentation
Get started
What is Consul?
Consul operations
Service networking
Enterprise solutions
Runtimes and platforms
HCP Consul Dedicated
Plugins, integrations, & extensions
Reference docs
Glossary

Consistency

The page provides conceptual information about Consul's anti-entropy mechanism, which keeps Consul catalog results consistent across nodes in a datacenter.

Introduction

Entropy is the tendency of systems to become increasingly disordered over time. Consul includes anti-entropy mechanisms to counter this tendency and keep the state of the cluster ordered, even when cluster components fail.

In Consul, there is a distinction between the global service catalog and the agent's local state. Agents forward information about services and their registered health checks to the leader node in the cluster, which replicates the authorative global service catalog to the other server nodes. As a result, any node in a Consul cluster may have catalog information that differs from the other nodes at a specific moment in time.

Consul's anti-entropy mechanism reconciles catalog differences by periodically synchronizing the local agent state with the catalog.

For example, when a user registers a new service or check with the agent, the agent notifies the leader that this new check exists, and the leader updates the catalog. Similarly, when a check is deleted from the agent, it notifies the leader to update the catalog. Using this information, the catalog can respond intelligently to queries about its nodes and services based on their availability.

Consul treats the state of the agent as authoritative. If there are any differences between the agent's view and the catalog view, the agent uses its local view.

Periodic synchronization

Consul's anti-entropy mechanism is a long-running process. In addition to detecting agent changes, it periodically syncs service and health check information to the catalog. This sync ensures that the catalog closely matches the agent's actual state.

This capability also allows Consul to re-populate the service catalog, even in the case of complete data loss.

The amount of time between periodic anti-entropy runs varies based on cluster size. The following table describes the the relationship between cluster size, counted by the number of nodes in the cluster, and sync interval:

Cluster Size	Periodic Sync Interval
1 - 128	1 minute
129 - 256	2 minutes
257 - 512	3 minutes
513 - 1024	4 minutes
...	...

These intervals are approximate. To avoid too many nodes syncing at one time, each Consul agent randomly chooses a staggered start time within the interval window.

Synchronization failures

There are a number of situations where Consul's anti-entropy can fail. These include:

Agent misconfiguration
Misconfiguration of the agent's operating environment
I/O problems, such as a full disk or filesystem permission error
Networking problems, such as an agent being unable to communicate with the server

If an error is encountered during an anti-entropy run, the agent logs the error and continues to run. Syncs are designed to run periodically to automatically recover from these types of transient failures.

Consistency modes

When you use Consul's service discovery features to return a registered service instance, Consul forwards the request to the cluster's leader by default. That way, Consul returns the most recent authoritative results from the catalog.

You can change a Consul agent's consistency mode so that the agent returns services with a greater or lower degree of accuracy, depending on the needs of the workloads in your service networking environment.

There are three consistency modes for agents to return catalog information:

default - To return accurate results as quick as possible, agents forward catalog read requests to the cluster leader. In Raft, agents use leader leasing, which provides a set time window where the leader assumes its role is stable. If an election occurs before the leader leasing window is complete, the old leader continues to service read requests on behalf of the entire cluster. Therefore Consul may occasionally return a stale result, but it processes reads faster in exchange.
consistent - This mode is strongly consistent without caveats. It requires a leader to verify with a quorum of peers that it is still the leader before it returns results. This mode introduces additional traffic to all server nodes as a result. For read requests, results are always consistent, but requests have additional latency.
stale - This mode allows any server to service the read, regardless of whether it is the leader. Reads become faster and more scalable, but are more likely to return stale value. This mode also allows reads without a leader, meaning that Consul servers can still respond to requests during an outage.

For more information, refer to Consistency modes in the HTTP API documentation.

Jepsen testing

Jepsen is a tool designed to test the partition tolerance of distributed systems. It creates network partitions while fuzzing the system with random operations. The results are analyzed to find out if the system violates any of the consistency properties it claims to have.

As part of our Consul testing, we ran a Jepsen test to determine if any consistency issues could be uncovered. In our testing, Consul gracefully recovered from partitions without introducing any consistency issues.

Test output

The following output was captured during our Jepsen testing.

$ lein test :only jepsen.system.consul-test

lein test jepsen.system.consul-test
INFO  jepsen.os.debian - :n5 setting up debian
INFO  jepsen.os.debian - :n3 setting up debian
INFO  jepsen.os.debian - :n4 setting up debian
INFO  jepsen.os.debian - :n1 setting up debian
INFO  jepsen.os.debian - :n2 setting up debian
INFO  jepsen.os.debian - :n4 debian set up
INFO  jepsen.os.debian - :n5 debian set up
INFO  jepsen.os.debian - :n3 debian set up
INFO  jepsen.os.debian - :n1 debian set up
INFO  jepsen.os.debian - :n2 debian set up
INFO  jepsen.system.consul - :n1 consul nuked
INFO  jepsen.system.consul - :n4 consul nuked
INFO  jepsen.system.consul - :n5 consul nuked
INFO  jepsen.system.consul - :n3 consul nuked
INFO  jepsen.system.consul - :n2 consul nuked
INFO  jepsen.system.consul - Running nodes: {:n1 false, :n2 false, :n3 false, :n4 false, :n5 false}
INFO  jepsen.system.consul - :n2 consul nuked
INFO  jepsen.system.consul - :n3 consul nuked
INFO  jepsen.system.consul - :n4 consul nuked
INFO  jepsen.system.consul - :n5 consul nuked
INFO  jepsen.system.consul - :n1 consul nuked
INFO  jepsen.system.consul - :n1 starting consul
INFO  jepsen.system.consul - :n2 starting consul
INFO  jepsen.system.consul - :n4 starting consul
INFO  jepsen.system.consul - :n5 starting consul
INFO  jepsen.system.consul - :n3 starting consul
INFO  jepsen.system.consul - :n3 consul ready
INFO  jepsen.system.consul - :n2 consul ready
INFO  jepsen.system.consul - Running nodes: {:n1 true, :n2 true, :n3 true, :n4 true, :n5 true}
INFO  jepsen.system.consul - :n5 consul ready
INFO  jepsen.system.consul - :n1 consul ready
INFO  jepsen.system.consul - :n4 consul ready
INFO  jepsen.core - Worker 0 starting
INFO  jepsen.core - Worker 2 starting
INFO  jepsen.core - Worker 1 starting
INFO  jepsen.core - Worker 3 starting
INFO  jepsen.core - Worker 4 starting
INFO  jepsen.util - 2   :invoke :read   nil
INFO  jepsen.util - 3   :invoke :cas    [4 4]
INFO  jepsen.util - 0   :invoke :write  4
INFO  jepsen.util - 1   :invoke :write  1
INFO  jepsen.util - 4   :invoke :cas    [4 0]
INFO  jepsen.util - 2   :ok :read   nil
INFO  jepsen.util - 4   :fail   :cas    [4 0]
(Log Truncated...)
INFO  jepsen.util - 4   :invoke :cas    [3 3]
INFO  jepsen.util - 4   :fail   :cas    [3 3]
INFO  jepsen.util - :nemesis    :info   :stop   nil
INFO  jepsen.util - :nemesis    :info   :stop   "fully connected"
INFO  jepsen.util - 0   :fail   :read   nil
INFO  jepsen.util - 1   :fail   :write  0
INFO  jepsen.util - :nemesis    :info   :stop   nil
INFO  jepsen.util - :nemesis    :info   :stop   "fully connected"
INFO  jepsen.core - nemesis done
INFO  jepsen.core - Worker 3 done
INFO  jepsen.util - 1   :invoke :read   nil
INFO  jepsen.core - Worker 2 done
INFO  jepsen.core - Worker 4 done
INFO  jepsen.core - Worker 0 done
INFO  jepsen.util - 1   :ok :read   3
INFO  jepsen.core - Worker 1 done
INFO  jepsen.core - Run complete, writing
INFO  jepsen.core - Analyzing
(Log Truncated...)
INFO  jepsen.core - Analysis complete
INFO  jepsen.system.consul - :n3 consul nuked
INFO  jepsen.system.consul - :n2 consul nuked
INFO  jepsen.system.consul - :n4 consul nuked
INFO  jepsen.system.consul - :n1 consul nuked
INFO  jepsen.system.consul - :n5 consul nuked
1964 element history linearizable. :D

Ran 1 tests containing 1 assertions.
0 failures, 0 errors.

We ran Jepsen multiple times, and Consul passed each time. This output is only representative of a single run and has been edited for length.