Adaptive Channel Recommendation for Dynamic Spectrum Access
The authors propose a dynamic spectrum access scheme where secondary users recommend "Good" channels to each other and access accordingly. They formulate the problem as an average reward based Markov decision process. They show the existence of the optimal stationary spectrum access policy, and explore its structure properties in two asymptotic cases. Since the action space of the Markov decision process is continuous, it is difficult to find the optimal policy by simply discretizing the action space and use the policy iteration, value iteration, or Q-learning methods. Instead, they propose a new algorithm based on the Model Reference Adaptive Search method, and prove its convergence to the optimal policy.