Consul
Backup and restore a Consul datacenter
This page describes the process to backup and restore a Consul datacenter with snapshots.
Introduction
Backups help you recover from a Consul datacenter outage caused by any combination of network loss, operator error, or a corrupted data directory. Before your cluster is live in a production environment, we recommend that you test the restore functionality at least once.
We also recommend that you create a backup of a Consul cluster before you upgrade the version of Consul it runs. Then, if the upgrade does not go according to plan, you can downgrade to the last working version by restoring the cluster from the backup.
There are some situations where restoring from a backup is the only method to restore a Consul datacenter's previous functionality. We recommend creating backups on a regular basis to limit the Consul datacenter's potential downtime.
You can run the consul snapshot
command using the CLI or the API to save or restore a point-in-time snapshot. This snapshot includes, but is not limited to:
- Key-Value entries
- the service catalog
- prepared queries
- sessions
- ACLs
With Consul Enterprise, you can run a snapshot agent daemon that periodically takes snapshots and saves them to local or remote storage, including Amazon S3, Azure Blob Storage, and Google Cloud Storage. Consul Community Edition supports custom scripts to run the consul snapshot save
command or HTTP API request on a regular basis.
Consistency modes
By default, Consul takes snapshots in consistent
mode, where the agents forward requests to the current cluster leader, which then verifies that it is still the leader before it takes a snapshot. As a result, Consul cannot save snapshots if the datacenter lost quorum or if no leader is available.
To reduce the burden on the leader, it is possible to run the snapshot command on any server in stale
consistency mode. For scheduled backups or other non-critical procedures, stale
consistency mode is an appropriate backup solution. Not having full consistency means that a small number of recent writes may be omitted, although these writes are typically limited to data written in the last 100ms
or less.
However, we still recommend you take consistent
snapshots for write-heavy production use cases, or when you want to snapshot a cluster state immediately after a specific change.
Workflow
- Backup the Consul datacenter: Use the
consul snapshot save
command to create a backup of the Consul datacenter. - Verify the backup: Inspect the backup file to ensure it was created successfully.
- Restore from snapshot: Use the
consul snapshot restore
command to restore the Consul datacenter from the backup.
Backup a Consul datacenter
Run the basic snapshot command on one of the servers. Because it uses the default settings, this request runs in consistent
mode.
$ consul snapshot save backup.snap
Saved and verified snapshot to index 1176
The backup will be saved locally in the directory where you ran the command.
You can view metadata about the backup with the inspect
subcommand.
$ consul snapshot inspect backup.snap
ID 2-1182-1542056499724
Size 4115
Index 1182
Term 2
Version 1
For more information about the snapshot inspect
sub-command and its output, refer to the consul snapshot inspect
CLI documentation.
Restore a Consul datacenter
You can restore a datacenter from a Consul snapshot with the consul snapshot restore
command. To succeed, the target cluster must be stable and have a leader. To verify the current Raft members state, run consul operator raft list-peers
. If the cluster does not have a leader, check server logs and telemetry for signs of leader election or network issues.
When you restore a Consul cluster from a snapshot on new infrastructure, make sure that the node names are unique. Otherwise, you might encounter restore issues due to naming conflicts. For more information, refer to Snapshot restore error.
You do not need to run the snapshot restore process more than once. If the target Consul server is not the cluster leader, it streams the snapshot data contents to the leader. The Raft consensus protocol ensures that all servers restore the same state.
$ consul snapshot restore backup.snap
Restored snapshot
Additional guidance
For more information on disaster recovery, including detailed instructions on how to backup and restore Consul datacenters, refer to the following resources: