Mobility

Multi-Armed Bandit Problems With Heavy Tail Reward Distributions

Free registration required

Executive Summary

In the Multi-Armed Bandit (MAB) problem, there are a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. The essence of the problem is the tradeoff between exploration and exploitation: playing a less explored arm to learn its reward statistics for future benefit or playing the arm with the largest sample mean at the current time. An approach based on a Deterministic Sequencing of Exploration and Exploitation (DSEE) is developed for constructing sequential arm selection policies.

  • Format: PDF
  • Size: 134.6 KB