I’m launching into a project to implement a distribute cache in front of our bidding and impression layer. Part of the project is to evaluate caching technologies and the most popular technologies are: redis and memcache.
The workload on the cache will be highly distributed and very write-intensive, with many clients and many more writes than reads. Therefore whatever system I end up with will have to be able to support many concurrent clients with an low latency writes and an efficient cache eviction scheme.
The Features
Redis
Redis’s is supposed to be basically memcache 2.0 in that it’s supposed to do everything memcache and more! From my research it looks like Redis is much more feature rich with out-of-the-box support for data structures, more eviction strategies, and a richer command set for more fine-grained control of the cache in aspects like memory management.
Pros:
- Richer feature set
- Native support for some data structures (e.g. hyperloglog, sorted lists)
- More control over cache internals and cache eviction
- Supports persistence to disk
- Single-threaded design makes horizontal scaling within a host easier (need more concurrency? just create more instances!)
Cons:
- Single-threaded
- More control over the cache storage means more chances of performance-degrading configuration
Memcache
Memcache is supposed to do one simple thing and do it well: in-memory key-value storage. It really doesn’t do much more than that so it’s easier to set up and straightforward to understand and set up but of course that is at the cost of less knobs to tweak and less control over the cache.
Pros:
- Simple memory management (slab allocator)
- More memory efficient because it stores minimal metadata
- Multithreaded
- Faster (see benchmark)
Cons:
- Memory management can result in fragmented memory if stored object sizes varies
The Benchmarks
First to understand the performance of redis vs memcache, here is yet another redis vs memcache benchmark. As previously mentioned the caching layer that I’m trying to build will be very write intensive and it is sensitive to write latency since the read could come quickly after the write (lowest I’ve seen is 3ms). Therefore while throughput is of course interesting since overall we will have to be able to handle a certain volume of data, latency is perhaps the more interesting metric.
The benchmark setup is on my development Ubuntu machine (8-cores 1.6GHz w/ 32 GB of RAM) writing ~5M of the Java object that I will ultimately be caching (immutable object containing ~200 fields with a variety of mostly primatives and a few Java objects). The 5M objects have already been serialized into files so for the benchmark I’ll be deserializing the objects and writing them into the cache. For each technology I spin up one instance on my local machine and then wrote a simple Java main method that assigns the files to a thread pool where each thread would write one object to the cache. I excluded the deserialization and and encoding logic out of the benchmark so I only surrounded the cache “set” call with timestamps. So the thread looks like this:
Object obj; try ( FileInputStream fin = new FileInputStream(inputFile); ObjectInputStream ois = new ObjectInputStream(fin) ) { while ((obj = ois.readObject()) != null) { long t0 = System.nanoTime(); cache.set(getKey(obj), obj); long storeTime = System.nanoTime() - t0; } } catch (EOFException ex) { System.out.println(inputFile.getName() + " completed."); } catch(Exception ex){ ex.printStackTrace(); }
Here are my findings:
Redis
# of clients (threads) | Throughput (stores/s) | Latency (median) | Latency (95th) | Latency (99th) |
---|---|---|---|---|
1 | 5,124 | 138 us | 293 us | 345 us |
8 | 27,533 | 45.8 us | 260 us | 278 us |
16 | 28,569 | 257 us | 505 us | 713 us |
32 | 28,416 | 718 us | 1.18 ms | 1.64 ms |
Interesting note: This is not surprising since Redis is single-threaded but it does look like that there is a hard upper limit to throughput (~28K stores/s) since at some point the single Redis thread/CPU core will be saturated.
Memcache
# of clients (threads) | Throughput (stores/s) | Latency (median) | Latency (95th) | Latency (99th) |
---|---|---|---|---|
1 | 11,995 | 50.3 us | 91.1 us | 117 us |
8 | 57,139 | 95.0 us | 147 us | 219 us |
16 | 42,174 | 101 us | 197 us | 303 us |
32 | 84,348 | 261 us | 324 us | 427 us |
Note:
I am aware that since Redis is single-threaded this isn’t completely a fair benchmark since the suggested way to increase Redis concurrency is to create more instances of Redis and spread out the writes across the instance but since we will have significantly more writing clients than cache hosts, it is important to me that the cache will be able to handle many concurrent writes. There comes a point when creating more instances of Redis becomes an operational nightmare but perhaps one day it is worth investigating this benchmark over more Redis instances. In any case this is my attempt to understand concurrent write loads on a single instance of each cache technology.
Conclusions
The system that I’ll be building will be using the cache in a very simple way: storing an objects then sometimes getting the object. The cache won’t even need update support since the system will be write once-and-for-all. Since for now this system will be used to store and get an object as-is without any need for partial updates it doesn’t look like we’ll need the rich feature set of Redis.
Additionally, it looks like for our workload memcache performs much faster than Redis, even in the single-client case.
I am indeed concerned about memcache’s simple memory management that may result in fragmented memory but I think I may have a solution for that (learned from Facebook’s usage of memcache). But that will be another post.
In conclusion it seems pretty convincing to me that memcached will be the better option for my use case given the performance as well as the simplicity of our workload.