VDCBPI: An Approximate Scalable Algorithm for Large POMDPs
Source: University of Toronto
Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that mitigates both sources of intractability by combining the Value Directed Compression (VDC) technique with Bounded Policy Iteration (BPI). The scalability of VDCBPI is demonstrated on synthetic network management problems with up to 33 million states.