Weaver, Ian; Kumar, Vineet - 2022
We propose a novel theory-based approach to the reinforcement learning problem of maximizing profits when faced with an unknown demand curve. Our method is based on multi-armed bandits, which are a collection of minimal assumption non-parametric models that balance exploration and exploitation...