Eliminate bottlenecks with a buffered iterator

Buffered iterators work in a similar fashion as BufferedInputStreams and can reduce delays in an iterator series. We'll walk you through some sample code line by line to demonstrate the process.

In many cases, a java.util.Iterator is preferable to java.util.List or an array as a return value because an iterator can be returned before the full data set is available from a source. For example, assume we have a server process whose task is to relay a series of objects from a source to a destination. Using a getter method that returns an iterator allows the sending code to begin sending the series off to the final destination while the data is still being received. If, on the other hand, we use a getter that returns an array or a list, the sending code can't start sending until the full series has been received to populate the array or list.

More on iterators in Java
The java.util.Iterator and java.util.List classes essentially do the same thing: they give you access to a sequence of objects. But a key difference gives iterators an edge in certain situations. Read more about iterators and lists.

Returning an iterator that's still being filled saves you from having to wait for the source to produce all the data before returning a reference to the series. However, you can reduce overhead even further. If the source can produce objects faster than the iterator's consumer can take them in, the source could be left waiting for the iterator's consumer to call next() so that it can ready another object for sending. Just as a BufferedInputStream helps alleviate sender delay when dealing with streams, a BufferedIterator can eliminate similar bottlenecks in an iterator series.

Listing A provides the source code for a BufferedIterator. As you probably have guessed, the BufferedIterator reads from the source faster than data is consumed and buffers what hasn't yet been read. Usually, code this tricky would have lots of comments in it, but I left them out, since they’d clutter the source listing.

Since the BufferedIterator is an iterator, it implements the java.util.Iterator interface and honors the contracts laid down in that class's javadoc. The next thing you notice is that it has two constructors: One takes a single iterator, and the other takes the same plus an additional int parameter. Using this class is as easy as using a BufferedInputStream. Simply pass the source iterator as the first parameter to the BufferedIterator constructor, and use the BufferedIterator where you would use the source iterator.

Examining the code
Enough about what it does—let's look at how it works. On line 7, we see the defining of the LinkedList that will serve as our object queue. The LinkedList class's convenient add and removeFirst methods make it ideal for a queue structure. With the queue in place, the class just needs two things: code to enqueue objects from the source and code to dequeue objects out of the iterator interface.

Dequeuing objects
The dequeuing code is the least complicated of the two, so let's look at that first. The standard iterator methods next() and hasNext() handle the bulk of the work here. Both methods are synchronized, which is important because each modifies the nextReturn private field. If only one or the other or neither were synchronized, it would be possible for an object to get dropped from the source iterator's series.

The next() method calls hasNext() on line 66 and, in accordance with the iterator javadoc, throws a NoSuchElementException if no more objects are available now or will be available in the future. If one will be available, hasNext() will have put it into the nextReturn field. The next() method then nulls that field and returns the value that had been stored within.

The hasNext() method is a little more involved. Its primary function is to return a Boolean indicating whether another object is or will be available in the future, but its secondary task of filling the nextReturn field is just as important.

The while loop beginning on line 45 makes sure that it hasn't already done its work and bails out quickly if it has. Assuming that nextReturn is null, it first grabs the monitor of the LinkedList queue using the synchronized block beginning on line 46. With this monitor in hand, it checks to see whether objects are available in the queue. If one is present, it's assigned to nextReturn. If, however, no objects are available, hasNext doesn't have enough information to complete its task. Until it can determine whether more objects will be coming, it calls wait() on the queue and sits until notified by another thread. This blocking represents the case where the consuming of the iterator is outpacing the source's object production rate.

Filling the queue
Let's leave hasNext() alone for a minute and look now at the larger of the two constructors, which begins on line 11. Everything is straightforward until line 14, when we begin an anonymous inner thread. Depending upon whom you ask, anonymous inner threads are either the worst coding idiom since goto, or they're a neat trick. I side with the latter and use one here, though, of course, a new named, top-level class could do the same thing. Notice that although the class defined for the thread is never named, the particular thread instance is given a name, BufferedIteratorFiller, as a parameter to its constructor.

Unnamed thread instances are the bane of debuggers and performance profilers on every project. Make their lives easier: Name your thread instances. In the thread's run() method, you see a fairly basic while loop beginning on line 17. This loop continues as long as the source iterator has objects available. As each one is popped off the iterator (line 18), the monitor for the LinkedList queue is grabbed. Because this monitor is grabbed before any changes are made to the queue, both threads operating on it are in agreement about its state and which has rights to modify it.

When the filler thread has grabbed the queue monitor on line 19, it checks to make sure the size of the queue hasn't exceeded the specified maximum. If there's room in the queue, the item is added. If there's no room in the queue, the filler thread calls the wait() method on the queue, where it will halt until notified by another thread. The waking of this thread from the wait will be handled by the hasNext() method, as explained shortly.

No room in the queue
Looking at the case where there was room on the queue for the object that was removed from the source iterator, we see that with the queue's monitor still in hand, the filler thread first adds the object to the queue and then calls the notify() method on the queue. This notify() is the wake-up call that the hasNext() method has been waiting for in order to check whether more objects are available in the queue for consumption. However, before looking at the hasNext() method's waking, let's finish with the run() method of the filler thread.

The while loop that began on line 17 is exited when the source iterator is completely depleted. The run() method now has one last task to perform. It once again grabs the queue monitor and enqueues a reference to a private field named done. The done variable serves as a sentinel that tells the hasNext() method that no more objects will be forthcoming. After enqueuing the done sentinel and notifying one last time in case the hasNext() thread is waiting, the filler thread run method and life come to an end.

Waiting for more objects
Back in the hasNext() method, we look again at the  wait() call on line 49. Here, we can see the calling thread waiting until more objects are placed into a depleted queue and the queue is notified. That notify comes from the filler thread every time it places another object into the queue. With a newly nonempty queue, the while loop that began on line 47 is exited. An object is removed from the queue and stored in the nextReturn variable. Then, while the calling thread still has control of the monitor for the queue object, a notify() is sent. This is the notify() that wakes the filler thread if it is waiting on line 22 to add more items to the queue. After that notify, all that's left is to make sure the removed object isn't the done sentinel. If it is, false is returned because the receipt of the done object means that the source iterator is depleted. If it's not the done sentinel, true is returned and nextReturn is ready for distribution by the next() method.

Bidirectional communications
Before we leave BufferedIterator behind, let's consider how wait and notify are used. On the same object, we have two threads, each calling wait() and notify(). The filler thread waits to hear that there's room in the queue and uses notify() to say that there are new objects in the queue. The thread calling the hasNext() method waits to hear that objects have been added to the queue and calls notify() to indicate that an object has been removed. It's a tricky setup and deadlock prone, but as long as you save your code, you'll have to write it only once.


Editor's Picks