Software Development

Deciding between iterators and lists for returned values in Java

The java.util.Iterator and java.util.List classes essentially do the same thing: They give you access to a sequence of objects. But a key difference gives iterators an edge in certain situations.

The java.util.Iterator and java.util.List classes basically just provide access to a sequence of objects. However, they have one fundamental difference that can make iterator a more attractive method for returned values in some cases: Iterators can be returned and manipulated before the stored data is fully available, whereas a list, or an array for that matter, must be fully populated before it can safely be returned.

Imagine a case where you're drawing objects from a source that produces each object after some nonzero delay. This situation is fairly common in software that involves network I/O or even large ResultSets from databases. When crafting a method to provide the series of objects that represents the product of that slow source, either a list or an iterator fits the bill. Let's try it first with a list.

Using a list
In Listing A, you can see that the getUsers method buffers up the full result set before returning its list. Not only can this take excessive memory, but it can also add unnecessary delay. If the caller of the getUsers method is going to handle the returned objects one at a time—printing them, for example—the creation of a collection is a waste.

Using an iterator
Listing B shows basically the same method using an iterator as a return value. At first, it appears to be a bit more complicated. It uses an anonymous inner class, which some developers can find a little confusing; but otherwise, everything is pretty straightforward. The important thing to note is that no results are taken from the ResultSet before the iterator is returned. The iterator can be returned well before the ResultSet is fully populated. You could speed the process even more by moving the ResultSet creation into the hasNext method of the iterator, so that ResultSet isn't created until hasNext has been called the first time.

Another thing worth pointing out is that since the iterator's next method calls the hasNext method internally, we can guarantee that the ResultSet's next method is called before the first call to the getString method—a requirement of the ResultSet API. Furthermore, the private variable named next in the iterator and the associated check at the top of the hasNext method make calling hasNext repeatedly between calls to the next method a safe action—a requirement of the iterator API.

I chose to omit all synchronization because, like the authors of the Java Collections API, I think it's often more efficient to let the consumer of the iterator handle synchronization if the consumer feels it's necessary. However, if you wanted to make this iterator thread safe, you'd just need to synchronize the hasNext and next methods.

Converting an iterator to a list
Of course, sometimes you need to do more with a sequence of objects than iterate through it. But if you've written your method to return an iterator, the caller need only use simple code like that in Listing C to convert the iterator to a list. There is no way to convert a list to an iterator and regain the ability to access the first data item before the last has been produced.

I'd be remiss if I didn't point out that there's an easier and wholly ineffective way of producing an iterator from a slowly produced data source. In Listing D, you can see a method that has the same method signature, return value, and externally visible behavior as the example in Listing B. The difference is in performance and memory consumption. Listing D pulls all the data into a list and then creates the iterator. This has all the drawbacks of Listing A and none of the benefits.

One further enhancement you can add to the iterator implementation in Listing B is a buffer. The single next variable can be replaced with a LinkedList that stores objects not yet returned through the iterator. A separate worker thread can fill the buffer from the result set, and the iterator's next method can dequeue them. You can even incorporate a high/low watermark design to ensure that the buffer doesn't consume too much memory itself. Your requirements will rarely call for a data path that complicated, but when they do, an iterator will serve you where a list will fail you.

Another cup of Joe
Tell us what area of Java you would like to see covered, or post a comment below.

 
0 comments

Editor's Picks