Java Weak References

This is just a note about an (in my opinion at least) interesting use case for Java weak references.

Weak References

Java helps the developer take care of tracking objects and memory and cleaning up unused objects (garbage collection). Long story short, when Java determines that it should run garbage collection, it will look for unreferenced objects and release the allocated memory to be used for future objects. An “unreferenced” object is where nothing in the program holds a pointer (or “reference”) to that object. A simple example of an unreferenced object:

Object obj = new Object();
obj = null;

In this trivial example, the call to new creates a new object by allocating some chunk of memory to hold this Object. When the obj variable is set to null, this program no longer holds a reference to the original Object. This means when the JVM garbage collection runs, it will find the memory allocated to Object to be unreferenced and then free up that memory.

Using the = operator is called a “strong” reference in Java. This means that garbage collection will not clean up memory with a reference created via “=“. A “weak” reference, however, means that even though there is a weak reference to an object, garbage collection will free up the allocated memory. This is accomplished by creating a WeakReference object:

WeakReference ref = new WeakReference(new Object());

This creates a weak reference to the newly created/allocated Object which means that this weak reference will not be enough to prevent this object from being cleaned up in garbage collection.

So what does “will not be enough to prevent this object from being cleaned up” exactly mean? A good way to understand this is with a common use-case for weak references: caches.

An Example

The whole idea of a cache is of course to make some objects more easily (cheaply?) accessible (usually by putting it in memory) as long as that object is needed. This usually results in the age-old problem of what entries to put/retain/clean up in a cache since if you could fit all your objects in memory then you wouldn’t need a cache. There are of course many cache-retention policies, but one way to manage the cache is to use Weak Reference.

To do this you would create a map of weak references to objects:

Map<String, WeakReference> cache = new HashMap();

cache.put("key1", new WeakReference(new Object()));

Object val = cache.get("key1).get();
if (val != null) {
// do stuff with "val"
} else {
cache.remove("key1");
}

In this example you will note a few things:

1. The value in the map is a WeakReference to an Object*

2. We are null-checking the actual retrieved Object even though the key exists in the map

3. The key is removed out of the map if we’ve discovered the Object to be null

The reason for all this is because again, the Weak Reference is not strong enough to prevent the object from being cleaned up in Garbage Collection and so it is entirely possible that the object has been cleaned up but the key still exists in the map. This works as a cache because this means as long as there is a strong reference to that object somewhere in the program, that object will not be cleaned up. In other words, when GC is triggered, objects from the cache that are actively being used will not be cleaned up.

*BTW this is not a WeakHashMap (also provided in Java) where the key is a weak reference.

My interesting use-case

Now on to my use-case. When designing EDC, I recognized that there are times when it is useful to print out logs (using log4j) when debugging. Since EDC is designed to be highly performant, I also didn’t want to have to do a bunch of String building everytime the toString method is called since the system is supposed to be handle magnitudes of 100K/s operations. This means that I decided to cache the String representation in the object itself after the first time toString is called (note that this is only possible because I designed data objects in EDC to be immutable, a good practice for OOP) to prevent many expensive String manipulations.

However, during normal operation (i.e. not debugging) verbose logging should be turned off which means I don’t want the heap to be filled with string representations of a bunch of objects. This is where I decided to use WeakReferences! I found that while debugging, I usually have verbose logging turned on for a relative short period of time just so I can observe a sample of the traffic. This means that after the verbose logging is turned off (via log4j logging levels), all those object string representations will be cleaned up by GC.

The code looks like this:

public class EDCObject {
private WeakReference str = new WeakReference(null);
public String toString() {
if (str.get() == null) {
str = new WeakReference("string representation of object");
}
return str.get();
}
}

Further Reading: Soft References

Soft references are similar to Weak References but a bit stronger. The difference is that Soft references are only cleaned up if the GC sees that it is about to completely run out of memory (i.e. OutOfMemory). Barring impending OOM, the JVM will hold on to Soft references. Weak references are cleaned up a bit more eagerly (i.e. next garbage collection cycle).