Journal of Machine Learning Research (JMLR)
Recently, Bayesian Optimization (BO) has been used to successfully optimize parametric policies in several challenging Reinforcement Learning (RL) applications. BO is attractive for this problem because it exploits Bayesian prior information about the expected return and exploits this knowledge to select new policies to execute. Effectively, the BO framework for policy search addresses the exploration-exploitation trade-off. In this paper, the authors show how to more effectively apply BO to RL by exploiting the sequential trajectory information generated by RL agents. Their contributions can be broken into two distinct, but mutually beneficial, parts.