Date Added: Jan 2010
The authors consider a cognitive radio network with distributed multiple secondary users, where each user independently searches for spectrum opportunities in multiple channels without exchanging information with others. The occupancy of each channel is modeled as an i.i.d. Bernoulli process with unknown mean. Users choosing the same channel collide, and none or only one receives reward depending on the collision model. This problem can be formulated as a decentralized multi-armed bandit problem. They measure the performance of a decentralized policy by the system regret, defined as the total reward loss with respect to the optimal performance under the perfect scenario where all channel parameters are known to all users and collisions among secondary users are eliminated through perfect scheduling.