Bossaerts, Peter L.; Huang, Shijie; Yadav, Nitin - In: Risks : open access journal 8 (2020) 4/113, pp. 1-20
In traditional Reinforcement Learning (RL), agents learn to optimize actions in a dynamic context based on recursive estimation of expected values. We show that this form of machine learning fails when rewards (returns) are affected by tail risk, i.e., leptokurtosis. Here, we adapt a recent...