Dolgopolov, Arthur - 2022
I fully characterize the outcomes of a wide class of model-free reinforcement learning algorithms, such as Q-learning, in a prisoner’s dilemma. The behavior is studied in the limit as players explore their options sufficiently and eventually stop experimenting.Whether the players learn to...