Persistent Performance

8/22/2006 11:10:41 PM

Persistent Performance

There are a number of tricks that I discovered for getting performance out of persistent, immutable data structures. I also recently noticed this MSDN post on Persistent data structures.

Persistent Arrays and Hashtables

Persistent versions of arrays and hashtables do exist with similar constant-time behavior. They won’t be as fast since they obviously do more. Of course, traditional data structures can always be treated as immutable, if we commit to not making any more modification.

Lazy Data

It’s possible to modify immutable data by creating a node that represents the operation and points to the data, since the underlying data is guaranteed not to change. This can turn a lengthy operation into a constant-time one. For example, to construct a difference of two persistent sets, we create a node representing the difference of the two sets rather than actually building an entirely new set. Looking up an element then requires checking to see if the element exists in the first set and is absent in the second.

Modification Box

Modifying a persistent binary tree needn’t actually require log(n) space. Sleator and Tarjan (the same people who invented splay trees) came up with an stunningly elegant technique called modification boxes that adds a small constant amount of space per modification. It blends the two traditional techniques for maintaining persistence (path copying and fat nodes) and comes up with a superior approach to both.

The downside of using a modification box is that rebalancing and other behavior-preserving transformations/optimizations are either no longer possible or significantly harder than path copying.


Red-black trees are typically faster and shallower than treaps, but treaps are much easier to implement. R-B trees have a bounded depth of 2 lg n, while treaps are unbounded (ie, O(n)). The simplicity of treaps makes it straightforward to include more advanced operations such as set operations, even though the algorithms for R-B trees are similar using tree splitting and concatenation. (I noticed that the fast set operations described Blelloch and Miller could have been further optimized from O(m lg(n/m) ) where m <= n by including trivial equality checks; this could result in substantial performance gains if both treaps descended from a common parent. Another observation is the excellent potential for parallelization in their set algorithms.)

R-B Trees require a bit flag, whereas treaps require a priority value. However, the priority need not be stored, but can be obtained from the key’s hashcode. In this case, another optimization opportunity arises. The structure of the treap is completely determined by the elements it contains. This makes equality testing potentially faster and increases sharing opportunities. Some algorithms might be able to exploit this property effectively.

Equality Testing

I mentioned in an earlier post that comparing two immutable objects for equality can be made into a fast operation. Comparing hashcode values provides an instant check for inequality. However, if two equivalent objects reside in different memory locations, a positive test for equality can become very expensive, especially if the object tree is deep. Ideally, we would prefer to keep only one copy of equivalent objects, which we then share so that any two equivalent objects have the same exact reference.

The trick is to define a new Equals(ref object obj1, ref object obj2) function that compares two objects for equality and, in the case that both objects are determined equal, normalizes their references. Normalization makes both references refer to the older object, which is more likely to be shared and located in infrequently collected Gen 2. A crude approach is to compare the generation of each object using GC.GetGeneration(obj). A granular approach is to compare the actual address of each object when the first test fails: A lower address typically corresponds to an older object, especially when both objects are allocated in the same heap segment.

From what I have read and seen, the garbage collector allocates (via VirtualAlloc) a 16MB segment of memory at a time for its heap. When that segment is exhaused, a new current segment is allocated. All current Gen 0 and 1 allocations reside on the current segment; previous segments are Gen2. I suspect that GC’s special handling of pinned objects keeps this statement from being entirely true.

private struct AddressStruct
   public object Obj;
   public IntPtr Ptr;

private static int addressLocation = InitAddressLocation();

private static int InitAddressLocation()
    addressLocation = -1;
    if (GetAddress(null) != 0 || GetAddress("X") == 0)
        addressLocation= 1;
        if (GetAddress(null) != 0 || GetAddress("X") == 0)
            throw new Exception("Assumption of CLR data layout is invalid.");

// This won't always succeed since objects may be
// stored in different GC segments, but this is 
// extremely unlikely because a full GC & more would need to occur
    object obj1 = new object();
    object obj2 = new object();
    Debug.Assert(IsOlder(obj1, obj2));
    Debug.Assert(!IsOlder(obj2, obj1));
    return addressLocation;

public static unsafe long GetAddress(object obj)
    AddressStruct address;
    address.Obj = obj;
    address.Ptr = IntPtr.Zero;
    return (long)(&address.Ptr)[addressLocation];

public static bool IsOlder(object obj1, object obj2)
    if (obj1 == obj2 || obj2 == null) return false;
    if (obj1 == null) return true; 

    while (true)
        int count = GC.CollectionCount(0);
        int cmp = GC.GetGeneration(obj1) - GC.GetGeneration(obj2);
        bool result = (cmp != 0)
            ? cmp > 0
            : GetAddress(obj1) < GetAddress(obj2);
         if (count == GC.CollectionCount(0))
            return result;

public static bool Equals<T>(ref T obj1, ref T obj2)
    where T : class
    if (obj1 == obj2) 
        return true;
    if (obj1 == null || !obj1.Equals(obj2))
        return false;
    if (IsOlder(obj1, obj2))
        obj2 = obj1;
        obj1 = obj2;
    return true;






Net Undocumented is a blog about the internals of .NET including Xamarin implementations. Other topics include managed and web languages (C#, C++, Javascript), computer science theory, software engineering and software entrepreneurship.

Social Media