Model Checking MDPs With a Unique Compact Invariant Set of Distributions
The semantics of Markov Decision Processes (MDPs), when viewed as transformers of probability distributions, can described as a labeled transition system over the probability distributions over the states of the MDP. The MDP can be seen as defining a set of executions, where each execution is a sequence of probability distributions. Reasoning about sequences of distributions allows one to express properties not expressible in logics like PCTL; examples include expressing bounds on transient rewards and expected values of random variables, as well as comparing the probability of being in one set of states at a given time with another set of states. With respect to such a semantics, the problem of checking that the MDP never reaches a bad distribution is undecidable.