Cheung, Wang Chi; Simchi-Levi, David; Zhu, Ruihao - 2021
Motivated by applications in inventory control and real-time bidding, we consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under temporal drifts. In this setting, both the reward and state transition distributions are allowed to evolve over time, as long as...