Solving Stochastic Games
In reinforcement learning, Bellman's dynamic programming equation is typically viewed as a method for determining the value function - the maximum achievable utility at each state. Instead, the authors can view the Bellman equation as a method of determining all possible achievable utilities. In the single-agent case they care only about the maximum utility, but for multiple agents it is rare to be able to simultaneous maximize all agents' utilities. In this paper they seek to find the set of all achievable joint utilities (a vector of utilities, one for each player). This set is known as the feasible-set. Given this goal they can reconstruct a proper multi-agent equivalent to the Bellman equation that operates on feasible-sets for each state instead of values.