Networked Markov Decision Processes With Delays
Source: Stanford University
The authors consider a networked control system, where each subsystem evolves as a Markov Decision Process (MDP). Each subsystem is coupled to its neighbors via communication links over which the signals are delayed, but are otherwise transmitted noise-free. A centralized controller receives delayed state information from each subsystem. The control action applied to each subsystem takes effect after a certain delay rather than immediately. Such a distributed Markov decision process with inter-subsystem, observation and action delays can be represented as a Partially Observed Markov Decision Process (POMDP). They show that this POMDP is equivalent to an MDP with an observable state consisting of finite number of past states and actions.