Institute of Electrical & Electronic Engineers
Temporal difference learning methods have been successfully applied to a wide range of stochastic learning and control problems. In addition to correctness, one metric of a technique's performance is its learning rate - the number of iterations required to converge to an optimal solution. The learning rate can be increased by using multiple agents that can share experience. In a software environment, the potential speedup from additional agents is limited, since adding agents significantly increases the burden of computation and/or hinders real-time processing.