Mousavi, Mohammad; Glynn, Peter - 2014
We present a fully nonparametric method to estimate the value function, via simulation, in the context of expected infinite-horizon discounted rewards for Markov chains. Estimating such value functions plays an important role in approximate dynamic programming. We incorporate “soft...