University of California
The authors consider a dynamic pricing problem with unknown demand models. In this problem, a seller offers prices sequentially to a stream of potential customers and observes either success or failure in each sales attempt. The underlying demand model is unknown and can take one of two possible forms. They show that the problem can be formulated as a two-armed bandit with correlated arms. They develop a dynamic pricing policy based on likelihood ratio test that offers a finite regret, where regret is defined as the revenue loss with respect to the case with a known demand model. They further generalize this policy by introducing an exploration price to improve the regret.