Institute of Electrical & Electronic Engineers
This paper explores approximate methods to solve Markov decision processes for large systems through Policy iteration. Two methods, one using an embedded discrete time Markov chain and the other using time scale separation, are defined and compared with the solution obtained using traditional Policy iteration. First step solutions are found and compared for a radio resource management problem with two radio access technologies and two service types. The approaches proposed considerably reduce the computational cost while closely approximate the optimal solution. The solutions are extended by increasing the number of steps of policy iteration and results show that it is possible to reach the performance of the optimal policy when several steps are required reducing the computational cost.