Dictionary Collections

9/24/2005 9:24:42 PM

Dictionary Collections

The Base Class Library team silently introduced a new tree-based collection called SortedDictionary<K,V> into the framework. This collection is based on Red-Black trees, with guaranteed O(log n) operations for nearly all operations.

For non-ordered dictionaries, Dictionary<K,V>, based on a hash table, is still a better bet, because most operations average constant time. In addition, it has much better locality and less overhead. However, constant-time performance is not guaranteed; although unlikely, worst case time is still linear for objects with poorly distributed hash functions. This is especially true when hashing heterogeneous objects, since most objects have a default instance that returns a hash value of zero and, somewhat less often, these objects may return consecutive integer values starting from zero or one.

For ordered collections, Dictionary<K,V> may also be a better bet if insertion order and sort order are the same. Last time I checked during Beta 1, insertion ordering was preserved, provided no values were previously removed.

For nicely distributed hash values, I use the following function below adapted from Cormen’s famous book, Introduction to Algorithms, published by MIT Press.

public static int CreateHashCode(int value, long range)
    const long GoldenRatioBits = 2654435769;     Debug.Assert(range>0 && range <= 1L << 32);     uint hash = (uint)(value * GoldenRatioBits);     return (int)((hash * range) >> 32);             } public static int CreateHashCode(int value) {     return CreateHashCode(value, 1L << 31); }

The idea here is to take a pre-existing value and multiply it by the golden ratio, (1+sqrt(5))/2, then multiply the resulting fraction by the desired range of the new hash value--in this case, the range of positive 32-bit integer values. Other floating point values may do, but the golden ratio has some optimal properties. (In this case, GoldenRatioBits is the fractional part of the golden ratio, multiplied by 2^32. Using floating-point multiplication might have made the code more understandable but slightly slower.)






Net Undocumented is a blog about the internals of .NET including Xamarin implementations. Other topics include managed and web languages (C#, C++, Javascript), computer science theory, software engineering and software entrepreneurship.

Social Media