Cavazos-Cadena, Rolando - In: Mathematical Methods of Operations Research 54 (2001) 1, pp. 63-99
This note concerns discrete-time Markov decision processes with denumerable state space. A control policy is graded by the long-run expected average reward criterion, and the main feature of the model is that the reward function and the transition law depend on an unknown parameter. Besides...