Redis and Memcache – Listening And Learning

I’m launching into a project to implement a distribute cache in front of our bidding and impression layer. Part of the project is to evaluate caching technologies and the most popular technologies are: redis and memcache.

The workload on the cache will be highly distributed and very write-intensive, with many clients and many more writes than reads. Therefore whatever system I end up with will have to be able to support many concurrent clients with an low latency writes and an efficient cache eviction scheme.

The Features

Redis

Redis’s is supposed to be basically memcache 2.0 in that it’s supposed to do everything memcache and more! From my research it looks like Redis is much more feature rich with out-of-the-box support for data structures, more eviction strategies, and a richer command set for more fine-grained control of the cache in aspects like memory management.

Pros:

Richer feature set
Native support for some data structures (e.g. hyperloglog, sorted lists)
More control over cache internals and cache eviction
Supports persistence to disk
Single-threaded design makes horizontal scaling within a host easier (need more concurrency? just create more instances!)

Cons:

Single-threaded
More control over the cache storage means more chances of performance-degrading configuration

Memcache

Memcache is supposed to do one simple thing and do it well: in-memory key-value storage. It really doesn’t do much more than that so it’s easier to set up and straightforward to understand and set up but of course that is at the cost of less knobs to tweak and less control over the cache.

Pros:

Simple memory management (slab allocator)
More memory efficient because it stores minimal metadata
Multithreaded
Faster (see benchmark)

Cons:

Memory management can result in fragmented memory if stored object sizes varies

The Benchmarks

First to understand the performance of redis vs memcache, here is yet another redis vs memcache benchmark. As previously mentioned the caching layer that I’m trying to build will be very write intensive and it is sensitive to write latency since the read could come quickly after the write (lowest I’ve seen is 3ms). Therefore while throughput is of course interesting since overall we will have to be able to handle a certain volume of data, latency is perhaps the more interesting metric.

The benchmark setup is on my development Ubuntu machine (8-cores 1.6GHz w/ 32 GB of RAM) writing ~5M of the Java object that I will ultimately be caching (immutable object containing ~200 fields with a variety of mostly primatives and a few Java objects). The 5M objects have already been serialized into files so for the benchmark I’ll be deserializing the objects and writing them into the cache. For each technology I spin up one instance on my local machine and then wrote a simple Java main method that assigns the files to a thread pool where each thread would write one object to the cache. I excluded the deserialization and and encoding logic out of the benchmark so I only surrounded the cache “set” call with timestamps. So the thread looks like this:

Object obj;
try (
      FileInputStream fin = new FileInputStream(inputFile);
      ObjectInputStream ois = new ObjectInputStream(fin)
) {
   while ((obj = ois.readObject()) != null) {

      long t0 = System.nanoTime();

cache.set(getKey(obj), obj);

      long storeTime = System.nanoTime() - t0;
   }

} catch (EOFException ex) {
   System.out.println(inputFile.getName() + " completed.");
} catch(Exception ex){
   ex.printStackTrace();
}

Here are my findings:

Redis

# of clients (threads)	Throughput (stores/s)	Latency (median)	Latency (95th)	Latency (99th)
1	5,124	138 us	293 us	345 us
8	27,533	45.8 us	260 us	278 us
16	28,569	257 us	505 us	713 us
32	28,416	718 us	1.18 ms	1.64 ms

Interesting note: This is not surprising since Redis is single-threaded but it does look like that there is a hard upper limit to throughput (~28K stores/s) since at some point the single Redis thread/CPU core will be saturated.

Memcache

# of clients (threads)	Throughput (stores/s)	Latency (median)	Latency (95th)	Latency (99th)
1	11,995	50.3 us	91.1 us	117 us
8	57,139	95.0 us	147 us	219 us
16	42,174	101 us	197 us	303 us
32	84,348	261 us	324 us	427 us

Note:

I am aware that since Redis is single-threaded this isn’t completely a fair benchmark since the suggested way to increase Redis concurrency is to create more instances of Redis and spread out the writes across the instance but since we will have significantly more writing clients than cache hosts, it is important to me that the cache will be able to handle many concurrent writes. There comes a point when creating more instances of Redis becomes an operational nightmare but perhaps one day it is worth investigating this benchmark over more Redis instances. In any case this is my attempt to understand concurrent write loads on a single instance of each cache technology.

Conclusions

The system that I’ll be building will be using the cache in a very simple way: storing an objects then sometimes getting the object. The cache won’t even need update support since the system will be write once-and-for-all. Since for now this system will be used to store and get an object as-is without any need for partial updates it doesn’t look like we’ll need the rich feature set of Redis.

Additionally, it looks like for our workload memcache performs much faster than Redis, even in the single-client case.

I am indeed concerned about memcache’s simple memory management that may result in fragmented memory but I think I may have a solution for that (learned from Facebook’s usage of memcache). But that will be another post.

In conclusion it seems pretty convincing to me that memcached will be the better option for my use case given the performance as well as the simplicity of our workload.