Discovering service discovery (Zookeeper vs Consul)

Yesterday I tried to look into the differences between using Zookeeper vs Consul for service discovery. In summary I don’t think that Consul has anything that Zookeeper can’t accomplish, but Consul comes with it built-in.

Consul

How it works

I like to think of Consul in terms of two communication layers: the client communication layer, and the Consul servers communication layer.

Clients

All consul clients have a consul agent running alongside the actual service (similar to Netflix’s sidecar pattern). These agents have two jobs:

  1. Do health checks
  2. Maintain service registration and liveness

Consul agents provides built-in health check service where it will run a list of specified scripts at specified intervals and check the output of those scripts to mark the service as healthy or not. The agent will also expose these health checks via HTTP. The user will configure the health checks and provide the script to run so you can easily plug in scripts to check for disk space, memory usage, load average, TCP checks, ping, etc.

The Consul agents also take care of registration and liveness. It registers the client service to a central service registry (more on this later). It also maintains gossip-style connections with other clients. This gossip protocol distributes 3 pieces of information:

  1. Available servers
  2. Failure detection (via heartbeats)
  3. Broadcasting new messages/events (e.g. new leader)

Servers

Consul servers are at the core of the service discovery system. They are the final source of truth of what services are out there for what service types. The servers operate in a cluster that uses the RAFT consensus protocol to coordinate requests and transactions.

The Consul servers also participate in their own gossiping but this is between data centers. It operates very similarly to the clients and the purpose of this gossip is to broadcast available servers and any new messages.

Consul vs Zookeeper

I would summarize the difference between Zookeeper and Consul as: Consul is what you would get if you built a cross-data center service discovery on top of Zookeeper for a global distributed system. It basically just comes built-in a bunch of features that you would probably find yourself building for your global service discovery.

  • Lightweight service discovery agent that would register services to the discovery cluster and also keep track of & cache the list of available servers.
  • Health checks on the service that would remove/add services to the cluster if it is unhealthy/healthy.
  • Local service discovery clusters in each data center that would talk to other local cluster in other data centers for cross-dc service discovery.
  • Some sort of exposed HTTP dashboard to view service health and registration on individual servers as well as a global dashboard.