Donchev, Doncho; Yushkevich, Alexander - In: Mathematical Methods of Operations Research 45 (1997) 2, pp. 265-280
A symmetric Poissonian two-armed bandit becomes, in terms of a posteriori probabilities, a piecewise deterministic Markov decision process. For the case of the switching arms, only of one which creates rewards, we solve explicitly the average optimality equation and prove that a myopic policy is...