Multi-Armed Bandit Problems With Heavy Tail Reward Distributions

Provided by: University of California Topic: Mobility Date Added: Jul 2011 Format: PDF
In the Multi-Armed Bandit (MAB) problem, there are a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. The essence of the problem is the tradeoff between exploration and exploitation: playing a less explored arm to learn its reward statistics for future benefit or playing the arm with the largest sample mean at the current time. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies.

Find By Topic