Working with the GC instead of against it – System.WeakReference in .NET

While garbage collection isn’t the thing that sets .NET above development in C++ in my mind, it is a major factor.  The garbage collector in .NET is a wonderful thing.  It makes development much cleaner, safer, and more enjoyable in general.  Not only do you get great developer productivity boons, you also have a level of safety that just makes writing memory intensive code more fun.

The GC in the CLR is fantastic – until it isn’t.  As much as I love the garbage collector, it also can have some pretty nasty side effects, notably causing memory leaks that are sometimes difficult to trace and fairly painful to correct.

The problem with the GC is in how it works.  There is no mechanism in .NET to explicitly free memory: no delete statement, no free method, etc.  This is particularly difficult for ex-C/C++ developers to accept.  In many other languages, any time you allocate memory (ie: malloc or new in C++), you’d expect that, somewhere, you’re going to have a call to free or delete to cleanup after yourself.  C++ helped dramatically with the introduction of many smart pointer libraries, such as the nice smart pointers in boost.  These provide a sense of ownership, which is the basis of the RAII pattern.  However, competent C++ programmers still always have to think in terms of allocation and deallocation of memory.

.NET changes the rules with its introduction of a garbage collector.  Allocations in .NET are handled by the CLR, and references are tracked internally by the garbage collector and automatically cleaned up at some point after they are no longer in use.  Note the last statement; it’s a very important difference in .NET.  Memory is released at a time after no references are held to the object, but that time is not something where you have any direct control (outside of forcing a garbage collection via GC.Collect(), which is not recommended).

A small side note: Many C# and VB.NET developers confuse IDisposable with memory management.  IDisposable is about unmanaged resources, not memory.  Calling Dispose on an object does not release any (managed) memory associated with it.  Granted, native resources being wrapped by your IDisposable type may clean up and release memory at this point, but any managed types will still reside in memory after Dispose is called on an object.  It is possible to Dispose() of an object, and never release its memory!

The only way that memory is released is by the garbage collector.  The garbage collector tracks references to all of the objects it allocates.  If, at some point, it finds an object with no rooted object references, it will then clean up the object and release its memory.  As long as any object in your program holds a reference to an object, that object’s memory will not be released.  This includes a variable in one of your objects, which contains a variable to another object, which contains a variable to the object in question – as long as there is a path from your object to the object in question, the GC will leave it alone.

The CLR is assuming you know what you want in this case – if you refer to an object, you must want to use it at some point, so it shouldn’t release it.  However, this also means being careful to set handles to null if you’re never going to use them.  Sometimes this can be less obvious.  Take this sample case:

class Program
{
    private static Dictionary<string, MyClass> dictionary = new Dictionary<string, MyClass>();
    static void Main(string[] args)
    {
        dictionary.Add("Element1", new MyClass());
        dictionary.Add("Element2", new MyClass());

        // Do some work using dictionary
        dictionary.Remove("Element1");

        // Start the main application loop...
        Application.Run(new Form1());
    }
}

In this case, we have a static variable (which will automatically always be rooted), which holds a dictionary.  We added a couple of instances to our dictionary, but only removed one of them.  In this case, the MyClass instance in the dictionary keyed with “Element2” will never be released – it’s sitting there in a static variable for the life of the program.  If this wasn’t intended, it’s effectively a memory leak.

Static objects aren’t the only place this happens – it’s frequently a problem with events.  When a listener subscribes to an event, it’s adding a reference to the “source” of the event.  Even if you Dispose() of the event’s source object, that object will never get collected as long as the listener is still subscribed to the event.  This is another common cause of memory leaks in .NET applications.

In most situations, understanding how the GC tracks object references, and making sure to always remove them when you no longer need to keep a reference to an object is good enough – you’ll avoid leaking memory, and life is good again.  However, there are rare situations where the garbage collector’s normal behavior can cause some nasty side effects.

In particular, there are times when it is advantageous to keep a reference to an object, but you still want to allow the garbage collector to reclaim the object if necessary.  This is frequently the case with objects that use a lot of memory, but can easily be recreated if needed.  In this case, the CLR provides a type specifically to handle this situation: System.WeakReference.

WeakReference exists to support weak references to managed objects.  MSDN’s description of WeakReference:

A weak reference allows the garbage collector to collect an object while still allowing an application to access the object. If you need the object, you can still obtain a strong reference to it and prevent it from being collected.

There are times when this is very helpful – the sample MSDN uses is in a cache.  You can setup a cache in a dictionary (similar to what I included above), but instead of using the class directly in the value, you can save a weak reference to the object.  The code will change to something more like:

class Program
{
    private static Dictionary<string, WeakReference> dictionary = new Dictionary<string, WeakReference>();
    static void Main(string[] args)
    {
        dictionary.Add("Element1", new WeakReference(new MyClass()));
        dictionary.Add("Element2", new WeakReference(new MyClass()));

        // Do some work using dictionary, except now, this is required:
        MyClass element1 = dictionary["Element1"].Target as MyClass;
        // We need to make sure element1 has not been collected by the GC - if it has, we'd need to recreate it
        if (element1 == null)
        {
            // Recreate element1
        }
        // Use element1 here

        dictionary.Remove("Element1");

        // Start the main application loop...
        Application.Run(new Form1());
    }
}

Note the differences here from our original – Now, instead of saving MyClass instances directly in a Dictionary, we’re saving WeakReference instances which contain the reference to the MyClass instance.  This provides us direct access to the elements (via WeakReference.Target), but still allows the GC to collect the object and free it’s memory if required.  Unfortunately, this requires a little more effort on our part, since we now have to check to make sure that the reference is valid (via the null check), and recreate the element if needed.  This is critical – you have no way of predicting the timing of the GC.  It is possible to have the GC collect immediately after you add the element to the dictionary – the first MyClass instance may be collected before the second one is even added.  However, this does allow the GC the flexibility of cleaning up these resources – which means that now we will no longer leak the “Element2” instance of MyClass.

Granted, this situation is a bit contrived – but it is fairly easy to imagine a much more elaborate situation in which this may be useful.

For example, suppose you have a business application, and one screen of your application constructs a large chart and a user interface to edit the business objects which construct the chart.  Constructing this requires a large amount of data taken from the backend.  The first time the user opens up this screen, the application is going to have to go out to the data access layer and query the DB for potentially a large amount of data, which is a slow operation.  The application then takes the data from the DAL, builds the chart, and presents it to the user.

If the user moves off to another screen, you have two options.  You can keep the data around, or you can completely release your references to it and allow it to be cleaned up.  Both have advantages and disadvantages.

If you keep the data around, and the user comes back to your screen, you can completely avoid the large round trip to the backend.  This provides many benefits, such as keeping the application much more responsive and saving on your infrastructure costs since you’re hitting the backend less frequently.  However, this also means keeping that large amount of information in memory, potentially causing bloat or other problems on the user’s system.

If you instead release your references, you allow the GC to potentially clean up that memory when it deems it necessary.  If this is a large amount of data, that may prevent out of memory exceptions, and keep the application running more smoothly.  However, if the user comes back to the screen, the system will need to do another full round trip to regenerate the data.

WeakReference provides a third alternative – when the user moves to the new screen, you can hand over your reference to the data to a WeakReference instance, and keep a handle to that instead.  When the system runs low on memory, the GC will collect the data.  If, however, the user had used the data for a while (which means its more likely to be in generation 1 or 2, for details, refer to MSDN), the GC will be less likely to clean up that memory right away.  If the user comes back to that screen before the GC cleans up the memory, you can just use it immediately by putting it back into a strong reference (a normal variable), and completely avoid the round trip to the backend.

This is a perfect application for WeakReference.  Any time you have an object which uses a large amount of memory (the handle to the business objects in this case), but can be recreated if needed, there is a potential opportunity for using WeakReference.

Get to know the Garbage Collector, and understand how it works.  As .NET developers, it makes our life much easier, but, as with any tool, understanding it’s limitations and side effects is critical to using it effectively.

About Reed
Reed Copsey, Jr. - http://www.reedcopsey.com - http://twitter.com/ReedCopsey

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!