Date Added: Sep 2012
Finding an optimal sensing policy for a particular access policy and sensing scheme is a laborious combinatorial problem that requires the system model parameters to be known. In practice the parameters or the model itself may not be completely known making reinforcement learning methods appealing. In this paper a non-parametric reinforcement learning-based method is developed for sensing and accessing multi-band radio spectrum in multi-user cognitive radio networks. A suboptimal sensing policy search algorithm is proposed for a particular multi-user multi-band access policy and the randomized Chair-Varshney rule. The randomized Chair-Varshney rule is used to reduce the probability of false alarms under a constraint on the probability of detection that protects the primary user.