Optimal Cross-Layer Wireless Control Policies Using TD Learning
The authors present an on-line cross-layer control technique to characterize and approximate optimal policies for wireless networks. Their approach combines network utility maximization and adaptive modulation over an infinite discrete-time horizon using a class of performance measures they call time smoothed utility functions. They model the system as an average-cost Markov decision problem. Model approximations are used to find suitable basis functions for application of least squares TD-learning techniques. The approach yields network control policies that learn the underlying characteristics of the random wireless channel and that approximately optimize network performance.