Decentralized Online Learning Algorithms for Opportunistic Spectrum Access

Date Added: Jul 2011
Format: PDF

The fundamental problem of multiple secondary users contending for opportunistic spectrum access over multiple channels in cognitive radio networks has been formulated recently as a Decentralized Multi-Armed Bandit (D-MAB) problem. In a D-MAB problem there are M users and N arms (channels) that each offer i.i.d. stochastic rewards with unknown means so long as they are accessed without collision. The goal is to design a decentralized online learning policy that incurs minimal regret, defined as the difference between the total expected rewards accumulated by a model-aware genie, and that obtained by all users applying the policy.