Multi-Armed Bandit Problems With Heavy Tail Reward Distributions

Source: University of California

Favorite

Free registration required

In the Multi-Armed Bandit (MAB) problem, there are a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. The essence of the problem is the tradeoff between exploration and exploitation: playing a less explored arm to learn its reward statistics for future benefit or playing the arm with the largest sample mean at the current time. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies.
Format:PDF Size:134.60
Date:Jul 2011