Consul
Consistency
Consul uses an advanced method of maintaining service and health information. This page details how services and checks are registered, how the catalog is populated, and how health status information is updated as it changes.
Anti-Entropy
Entropy is the tendency of systems to become increasingly disordered. Consul's anti-entropy mechanisms are designed to counter this tendency, to keep the state of the cluster ordered even through failures of its components.
Consul has a clear separation between the global service catalog and the agent's local state as discussed above. The anti-entropy mechanism reconciles these two views of the world: anti-entropy is a synchronization of the local agent state and the catalog. For example, when a user registers a new service or check with the agent, the agent in turn notifies the catalog that this new check exists. Similarly, when a check is deleted from the agent, it is consequently removed from the catalog as well.
Anti-entropy is also used to update availability information. As agents run their health checks, their status may change in which case their new status is synced to the catalog. Using this information, the catalog can respond intelligently to queries about its nodes and services based on their availability.
During this synchronization, the catalog is also checked for correctness. If any services or checks exist in the catalog that the agent is not aware of, they will be automatically removed to make the catalog reflect the proper set of services and health information for that agent. Consul treats the state of the agent as authoritative; if there are any differences between the agent and catalog view, the agent-local view will always be used.
Periodic Synchronization
In addition to running when changes to the agent occur, anti-entropy is also a long-running process which periodically wakes up to sync service and check status to the catalog. This ensures that the catalog closely matches the agent's true state. This also allows Consul to re-populate the service catalog even in the case of complete data loss.
To avoid saturation, the amount of time between periodic anti-entropy runs will vary based on cluster size. The table below defines the relationship between cluster size and sync interval:
Cluster Size | Periodic Sync Interval |
---|---|
1 - 128 | 1 minute |
129 - 256 | 2 minutes |
257 - 512 | 3 minutes |
513 - 1024 | 4 minutes |
... | ... |
The intervals above are approximate. Each Consul agent will choose a randomly staggered start time within the interval window to avoid a thundering herd.
Best-effort sync
Anti-entropy can fail in a number of cases, including misconfiguration of the agent or its operating environment, I/O problems (full disk, filesystem permission, etc.), networking problems (agent cannot communicate with server), among others. Because of this, the agent attempts to sync in best-effort fashion.
If an error is encountered during an anti-entropy run, the error is logged and the agent continues to run. The anti-entropy mechanism is run periodically to automatically recover from these types of transient failures.
Enable Tag Override
Synchronization of service registration can be partially modified to
allow external agents to change the tags for a service. This can be
useful in situations where an external monitoring service needs to be
the source of truth for tag information. For example, the Redis
database and its monitoring service Redis Sentinel have this kind of
relationship. Redis instances are responsible for much of their
configuration, but Sentinels determine whether the Redis instance is a
primary or a secondary. Enable the
enable_tag_override
parameter in your service definition file to tell the Consul agent where the Redis database is running to bypass
tags during anti-entropy synchronization. Refer to
Modify anti-entropy synchronization for additional information.
Consistency Modes
Although all writes to the replicated log go through Raft, reads are more flexible. To support various trade-offs that developers may want, Consul supports 3 different consistency modes for reads.
The three read modes are:
default
- Raft makes use of leader leasing, providing a time window in which the leader assumes its role is stable. However, if a leader is partitioned from the remaining peers, a new leader may be elected while the old leader is holding the lease. This means there are 2 leader nodes. There is no risk of a split-brain since the old leader will be unable to commit new logs. However, if the old leader services any reads, the values are potentially stale. The default consistency mode relies only on leader leasing, exposing clients to potentially stale values. We make this trade-off because reads are fast, usually strongly consistent, and only stale in a hard-to-trigger situation. The time window of stale reads is also bounded since the leader will step down due to the partition.consistent
- This mode is strongly consistent without caveats. It requires that a leader verify with a quorum of peers that it is still leader. This introduces an additional round-trip to all server nodes. The trade-off is always consistent reads but increased latency due to the extra round trip.stale
- This mode allows any server to service the read regardless of whether it is the leader. This means reads can be arbitrarily stale but are generally within 50 milliseconds of the leader. The trade-off is very fast and scalable reads but with stale values. This mode allows reads without a leader meaning a cluster that is unavailable will still be able to respond.
For more documentation about using these various modes, see the HTTP API.
Jepsen Testing Results
Jepsen is a tool, written by Kyle Kingsbury, designed to test the partition tolerance of distributed systems. It creates network partitions while fuzzing the system with random operations. The results are analyzed to see if the system violates any of the consistency properties it claims to have.
As part of our Consul testing, we ran a Jepsen test to determine if any consistency issues could be uncovered. In our testing, Consul gracefully recovered from partitions without introducing any consistency issues.
Running the tests
At the moment, testing with Jepsen is rather complex as it requires setting up multiple virtual machines, SSH keys, DNS configuration, and a working Clojure environment. We hope to contribute our Consul testing code upstream and to provide a Vagrant environment for Jepsen testing soon.
Output
Below is the output captured from Jepsen. We ran Jepsen multiple times, and it passed each time. This output is only representative of a single run and has been edited for length. Please reach out on Consul's Discuss if you would like to reproduce the Jepsen results.
$ lein test :only jepsen.system.consul-test
lein test jepsen.system.consul-test
INFO jepsen.os.debian - :n5 setting up debian
INFO jepsen.os.debian - :n3 setting up debian
INFO jepsen.os.debian - :n4 setting up debian
INFO jepsen.os.debian - :n1 setting up debian
INFO jepsen.os.debian - :n2 setting up debian
INFO jepsen.os.debian - :n4 debian set up
INFO jepsen.os.debian - :n5 debian set up
INFO jepsen.os.debian - :n3 debian set up
INFO jepsen.os.debian - :n1 debian set up
INFO jepsen.os.debian - :n2 debian set up
INFO jepsen.system.consul - :n1 consul nuked
INFO jepsen.system.consul - :n4 consul nuked
INFO jepsen.system.consul - :n5 consul nuked
INFO jepsen.system.consul - :n3 consul nuked
INFO jepsen.system.consul - :n2 consul nuked
INFO jepsen.system.consul - Running nodes: {:n1 false, :n2 false, :n3 false, :n4 false, :n5 false}
INFO jepsen.system.consul - :n2 consul nuked
INFO jepsen.system.consul - :n3 consul nuked
INFO jepsen.system.consul - :n4 consul nuked
INFO jepsen.system.consul - :n5 consul nuked
INFO jepsen.system.consul - :n1 consul nuked
INFO jepsen.system.consul - :n1 starting consul
INFO jepsen.system.consul - :n2 starting consul
INFO jepsen.system.consul - :n4 starting consul
INFO jepsen.system.consul - :n5 starting consul
INFO jepsen.system.consul - :n3 starting consul
INFO jepsen.system.consul - :n3 consul ready
INFO jepsen.system.consul - :n2 consul ready
INFO jepsen.system.consul - Running nodes: {:n1 true, :n2 true, :n3 true, :n4 true, :n5 true}
INFO jepsen.system.consul - :n5 consul ready
INFO jepsen.system.consul - :n1 consul ready
INFO jepsen.system.consul - :n4 consul ready
INFO jepsen.core - Worker 0 starting
INFO jepsen.core - Worker 2 starting
INFO jepsen.core - Worker 1 starting
INFO jepsen.core - Worker 3 starting
INFO jepsen.core - Worker 4 starting
INFO jepsen.util - 2 :invoke :read nil
INFO jepsen.util - 3 :invoke :cas [4 4]
INFO jepsen.util - 0 :invoke :write 4
INFO jepsen.util - 1 :invoke :write 1
INFO jepsen.util - 4 :invoke :cas [4 0]
INFO jepsen.util - 2 :ok :read nil
INFO jepsen.util - 4 :fail :cas [4 0]
(Log Truncated...)
INFO jepsen.util - 4 :invoke :cas [3 3]
INFO jepsen.util - 4 :fail :cas [3 3]
INFO jepsen.util - :nemesis :info :stop nil
INFO jepsen.util - :nemesis :info :stop "fully connected"
INFO jepsen.util - 0 :fail :read nil
INFO jepsen.util - 1 :fail :write 0
INFO jepsen.util - :nemesis :info :stop nil
INFO jepsen.util - :nemesis :info :stop "fully connected"
INFO jepsen.core - nemesis done
INFO jepsen.core - Worker 3 done
INFO jepsen.util - 1 :invoke :read nil
INFO jepsen.core - Worker 2 done
INFO jepsen.core - Worker 4 done
INFO jepsen.core - Worker 0 done
INFO jepsen.util - 1 :ok :read 3
INFO jepsen.core - Worker 1 done
INFO jepsen.core - Run complete, writing
INFO jepsen.core - Analyzing
(Log Truncated...)
INFO jepsen.core - Analysis complete
INFO jepsen.system.consul - :n3 consul nuked
INFO jepsen.system.consul - :n2 consul nuked
INFO jepsen.system.consul - :n4 consul nuked
INFO jepsen.system.consul - :n1 consul nuked
INFO jepsen.system.consul - :n5 consul nuked
1964 element history linearizable. :D
Ran 1 tests containing 1 assertions.
0 failures, 0 errors.