Rusmevichientong, Paat; Mersereau, Adam J.; Tsitsiklis, … - 2009
We consider a multiarmed bandit problem where the expected reward of each arm is a linear function of an unknown scalar with a prior distribution. The objective is to choose a sequence of arms that maximizes the expected total (or discounted total) reward. We demonstrate the effectiveness of a...