Showing 1 - 2 of 2
We design new policies that ensure both worst-case optimality for expected regret and light-tailed risk for regret distribution in the stochastic multi-armed bandit problem. Recently, \cite{fan2021fragility} showed that information-theoretically optimized bandit algorithms suffer from some...
Persistent link: https://www.econbiz.de/10014083162
We build a new unified modeling and analysis framework for a broad class of online matching problems. The proposed unified framework encompasses a number of classical online matching problems and accommodates three practical features: reusable resources, network resources and decaying rewards....
Persistent link: https://www.econbiz.de/10014086265