The select() and poll() methods can be a powerful tool when you’re multiplexing network sockets. Specifically, these methods will indicate when a procedure will be safe to execute on an open file descriptor without any delays. For instance, a programmer can use these calls to know when there is data to be read on a socket. By delegating responsibility to select() and poll(), you don’t have to constantly check whether there is data to be read. Instead, select() and poll() can be placed in the background by the operating system and woken up when the event is satisfied or a specified timeout has elapsed. This process can significantly increase execution efficiency of a program. (If you are more concerned with performance than portability, we discuss some alternatives to select() and poll()toward the end of the article.)
As you will see, select() and poll() are very similar in functionality. Often, implementations of select() and poll() are mapped onto each other. For instance, in the Apache Portable Runtime, a core component of Apache 2.0, a portable interface is provided that mimics the poll() semantics. On platforms that do not have a native poll() implementation, the poll() semantics are mapped onto select(). On FreeBSD, the libc_r implementation of select() is merely a thin wrapper around the poll() system call.
The Single UNIX Specification, version 2 (SUSv2) defines select() as follows:
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds, struct timeval *timeout);
It takes these parameters:
- · int nfds - The highest file descriptor in all given sets plus one
- · fd_set *readfds - File descriptors that will trigger a return when data is ready to be read
- · fd_set *writefds - File descriptors that will trigger a return when data is ready to be written to
- · fd_set *errorfds - File descriptors that will trigger a return when an exception occurs
- · struct timeval *timeout - The maximum period select() should wait for an event
The return value indicates the number of file descriptors (fds) whose request event has been satisfied.
You can’t modify the fd_set structure by changing its value directly. The only portable way to either set or retrieve the value is by using the provided FD_* macros:
- · FD_ZERO(fd_set *) - Initializes an fd_set to be empty
- · FD_CLR(int fd, fd_set *) - Removes the associated fd from the fd_set
- · FD_SET(int fd, fd_set *) - Adds the associated fd to the fd_set
- · FD_ISSET(int fd, fd_set *) - Returns a nonzero value if the fd is in fd_set
Upon return from select(), FD_ISSET() can be called for each fd in a given set to identify whether its condition has been met.
With the timeout value, you can specify how long select() will wait for an event. If timeout is NULL, select() will wait indefinitely for an event. If timeout's timeval structures are set to 0, select() will return immediately rather than wait for any event to occur. Otherwise, timeout defines how long select() will wait. SUSv2 indicates that all compliant implementation will support a timeout of at least 31 days. Check out Listing A for a select() example.
The poll() method attempts to consolidate the arguments of select() and provides notification of a wider range of events. The SUSv2 defines poll() as follows:
int poll(struct pollfd fds[ ], nfds_t nfds, int timeout);
It takes these parameters:
- · struct pollfd fds[ ] - An array of pollfd structures
- · nfds_t nfds - The number of file descriptors set in fds[ ]
- · int timeout - How long poll() should wait for an event to occur (in milliseconds)
The return value indicates how many fds had an event occur.
A pollfd struct typically includes the following members:
- · int fd - Indicates which fd to monitor for an event
- · short events - A bitwise field that represents which events will be monitored
- · short revents - A bitwise field that represents which events were detected in a call to poll()
The SUSv2 specification details the precise value and meanings of the event’s bitfield. In comparison to select(), poll() allows a greater degree of flexibility in determining what type of events can be processed. In addition to the read, write, and error notifications, poll() also supports explicit recognition of out-of-band and high-priority data.
Unlike select(), poll()'s timeout is a simple integer that represents how long poll() will wait for an event. A special value—usually -1 or the constant INFTIM, which is used by many older systems—specifies that poll() will wait forever for an event. Like select(), a 0 timeout value indicates that poll() should return immediately.
In Listing B, the select() example in Listing A is shown as poll().
Alternatives to select() and poll()
Even with the advantage of having multiplexed event notification via select() or poll(), other implementations exist that provide better performance. However, these implementations are not standardized across platforms. You must weigh the potential performance benefits of using one of these specialized implementations against the loss of portability. We will examine two of these alternatives: Solaris' /dev/poll and FreeBSD's kqueue.
As you will see, both implementations gain their key performance benefits by leveraging the fact that, in the real world, developers continually call select() or poll() with the same fds. To eliminate overhead when passing the same arguments each time, fds that are examined can be cached. This approach also works well when a large number of fds are monitored, because some select() and poll() implementations have scalability issues.
In Solaris 7, Sun introduced the /dev/poll device. To use /dev/poll, you first open /dev/poll as you would for a normal file. Then, you construct pollfd structures in a manner similar to the normal poll() call. These pollfd structures are then written to the open /dev/poll file descriptor. For the lifetime of this open handle, /dev/poll will now return events according to that pollfd structure. (Note that a special POLLREMOVE in the events field of the pollfd structure will remove that fd from /dev/poll's list.) A program retrieves the information from /dev/poll by calling a special ioctl (DP_POLL) and dvpoll structure. By using the dvpoll structure, the events that have occurred can be determined.
Linux support and other resources
A patch exists to add /dev/poll support for the Linux 2.4 series, but it has not currently been accepted into mainstream Linux kernel trees. On the FreeBSD site, you can check man pages for Solaris, Red Hat, and other operating systems.
Introduced in FreeBSD 4.1, FreeBSD's kqueue API is designed to address a wider range of notifications than any of the other alternatives presented here. The kqueue API provides several generic filters that allow mimicking of the poll() semantics (EVFILT_READ and EVFILT_WRITE). However, it also allows notification of file system changes (EVFILT_VNODE), process state changes (EVFILT_PROC), and delivery of signals (EVFILT_SIGNAL). For more information on kqueue, download Jonathan Lemon’s paper (PDF format) from the BSDCon 2000, "Kqueue: A generic and scalable event notification facility."
To build on the introduction to select() and poll() presented in this article, I highly recommend Advanced Programming in the UNIX Environment by W. Richard Stevens.