Date Added: Oct 2009
In this paper, the authors define the problem of balancing exploration vs. exploitation in a cognitive engine controlled multi-antenna communication system in terms of the classical multi-armed bandit framework. They then employ the -greedy strategy and Gittins' indices methods for addressing the problem in a system with no prior information. Results show that the Gittins' indices assuming a normal reward process had the best overall performance compared to the Gittins' indices with a Bernoulli reward process and the -greedy strategy. The latter was found to be more consistent albeit inefficient for most of the cases except in the case of both a low number of trials and a low SNR in which it was found to have better performance than the other methods.