Collections provide a way to store arbitrary objects in a structured fashion, and we all know how useful they are in everyday programming. The .NET class library offers an embarrassment of collection riches—a bewildering selection of collection objects, each with a somewhat specialized purpose. Having more choices may mean more flexibility, but it means more complexity as well. So it's in your interest to have a good understanding of when the use of each flavor of collection is appropriate. Follow along with me as I sift through .NET's collection of collections.
The .NET collection defined
From a .NET perspective, a collection could be defined as an object that implements one or more of the System.Collections.ICollection, System.Collections.IDictionary, and System.Collections.IList interfaces. This definition leads to my classification of the "built-in" collections found in the System.Collections namespace into three broad categories:
- Ordered collections: Collections that implement only the ICollection interface are usually distinguished by the fact that insertion order controls the order in which objects may be retrieved from the collection. The System.Collections.Stack and System.Collections.Queue classes are both examples of ICollection collections.
- Indexed collections: IList-implementing collections are distinguished by the fact that their contents can be retrieved via a zero-based numeric index, like an array. The System.Collections.ArrayList object is one example of an indexed collection.
- Keyed collections: Collections implementing the IDictionary interface contain items that can be retrieved by an associated key value of some kind. The contents of IDictionary collections are also usually sorted in some fashion based on the key valu and can be retrieved in sorted order by enumeration. The System.Collections.HashTable class implements the IDictionary interface.
As you can see, the functionality of a given collection is very much controlled by the specific interface or interfaces it implements. If you don't have much exposure to object-oriented programming, that may seem awfully confusing, if not entirely pointless. You should know, however, that building an object's functionality in this manner not only gives families of similar objects similar sets of method signatures, but it also allows them to be treated as essentially the same class when necessary—a technique known as polymorphism among OOP initiates.
Guided tour of System.Collections
The System.Collections namespace contains six built-in, generic collections that you can use in your applications. A few other, more specialized collections are found in System.Collections.Specialized, and you may find them useful in certain circumstances. With a few exceptions, each of these specialized collections is similar in functionality to one of the built-in collections. Let's take a look at each of the generic collections and a few of the less esoteric specialized collections.
Stacks and queues
The System.Collections.Stack and System.Collections.Queue classes, which both implement the ICollection interface only, hold items of type System.Object in the order they were added to the collection. Objects can be retrieved from the collection only in this order: Stacks are last-in, first-out, and queues are first-in, first-out. Typically, you'll want to consider using one of these collections when:
- The order in which items are received and processed is important.
- You can discard an item after processing it.
- You don't need to access arbitrary items in the collection.
The System.Collections.ArrayList class, which implements only IList, could best be described as a hybrid of a normal array and a collection. ArrayLists hold items in the order they were added. Items are assigned an index identifier and can be retrieved in any order via their associated index number. The ArrayList grows as new items are added to it, which makes it more flexible than a normal array. However, an ArrayList has more overhead than a traditional array and is not strongly typed, accepting anything that can be cast to System.Object (in other words, everything).
System.Collections.SortedList, which implements both the IDictionary and ICollection interfaces, is your basic sorted collection, with the most similarity to VB6's Collection object. SortedLists store objects and sort them based on an associated key. They also are the only built-in .NET collection to support retrieval of objects by both index number and key.
The powerful System.Collections.HashTable collection implements both IDictionary and ICollection and can be used to store multiple items of type Object, along with an associated unique string key. Items in a HashTable are sorted in an order determined by a hash code derived from their key. While each object's key value in the collection must be unique, its hash code need not be.
What's a hash code?
A hash code is essentially the result of removing all redundant parts from a piece of data, and it serves as an aid to categorizing or sorting the data.
When an item is added to the collection, HashTable calls the key's GetHashCode method, which all classes inherit from System.Object, to determine its hash code and place in the sort order. You can force the use of a custom hash function either by overriding the GetHashCode method of a class or by passing an object implementing the System.Collections.IHashcodeProvider interface to the HashTable constructor, in which case that object will be used to generate hash codes for all keys added to the collection.
From a performance standpoint, HashTables can retrieve an arbitrary element from the collection very quickly because key searches are limited only to keys with the same hash code, which reduces the number of keys that must be checked to find a match. However, because a hash code must be generated for each object and key pair that's inserted into the collection, item insertions are somewhat expensive. Therefore, HashTables are ideal for situations where a large amount of relatively static data will be repeatedly searched for arbitrary keys.
ListDictionary and HybridDictionary
The ListDictionary and HybridDictionary classes are found in System.Collections.Specialized. Both organize items based on a unique key value, and both implement IDictionary and ICollection. The ListDictionary stores items internally as a linked list and is recommended for collections that won't grow much larger than 10 items. The HybridDictionary uses an internal linked list (actually a ListDictionary) for small collections and will convert that to a HashTable when the collection grows large enough (more than about 10 items, in case you haven't been paying attention) to make the linked list implementation inefficient.
StringCollection and StringDictionary
System.Collections.Specialized.StringCollection and System.Collections.Specialized.StringDictionary are both optimized for storing collections of strings. StringCollection implements both IList and ICollection and is essentially an ArrayList that is strongly typed to accept only strings. StringDictionary is a HashTable that has been strongly typed to accept only strings. StringCollection is ideal for smaller amounts of data that is frequently updated or added to, while StringDictionary is best suited for large amounts of data that will not be frequently added to, such as a HashTable.
System.Collections.Specialized.NameValueCollection is interesting in that it can contain multiple items associated with the same key value, which distinguishes it from all the other built-in collections. Aside from this, it functions similarly to a HashTable, sorting items based on a hash code derived from each item's key and thus has essentially the same advantages and disadvantages.
Trouble in paradise
If there's one problem with the built-in collections provided by the .NET class library, it's that the majority of them store items internally as type System.Object. Now, that's a fine idea from a standpoint of maximum flexibility, but it poses a few problems for the programmer who's using one of these generic collections. First, any time you add a new item to the collection, the runtime has to perform a box (creating a reference to a value type so that it may be referred to as an object) or downcast operation. That's a little inefficient and will create some performance issues on very large collections. Second, any time you access an item in a generic collection, it will be returned as type System.Object, meaning you'll have to upcast it back to its true type to do anything meaningful with it.