We present formal methods of improving multiple policies for solving controlled Markov set-chains with infinite-horizon iscounted reward criteria. The multi-policy improvement methods follow the spirit of parallel rollout for solving Markov decision processes (MDPs). In particular, these methods are useful for on-line control of Markov set-chains and for approximately solving MDPs via state aggregation. We further discuss issues on designing a policy-iteration type algorithm based on our policy improvement methods.
Download Full PDF Version (Non-Commercial Use)