Salomon, Antoine; Audibert, Jean-Yves - Université Paris-Dauphine (Paris IX) - 2014
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. [2] exhibit a policy such that with probability at least 1−1/n, the regret of the policy is of order log n. They have...