Bagnell, J. Andrew; Schneider, Jeff - 2003
Much recent work in reinforcement learning and stochastic optimal control has focused on algorithms that search directly through a space of policies rather than building approximate value functions. Policy search has numerous advantages: it does not rely on the Markov assumption, domain...