Optimistic posterior sampling for reinforcement learning : worst-case regret bounds
Year of publication: |
2023
|
---|---|
Authors: | Agrawal, Shipra ; Jia, Randy |
Published in: |
Mathematics of operations research. - Hanover, Md. : INFORMS, ISSN 1526-5471, ZDB-ID 2004273-5. - Vol. 48.2023, 1, p. 363-392
|
Subject: | Markov decision process | regret bounds | reinforcement learning | Thompson sampling | Stichprobenerhebung | Sampling | Markov-Kette | Markov chain | Entscheidung | Decision | Lernprozess | Learning process | Lernen | Learning | Entscheidung unter Unsicherheit | Decision under uncertainty | Begrenzte Rationalität | Bounded rationality | Entscheidungstheorie | Decision theory |
-
Choosing a good toolkit, I : prior-free heuristics
Francetich, Alejandro, (2020)
-
Reinforcement learning in robust Markov decision processes
Lim, Shiau Hong, (2016)
-
Robo-advising : learning investors' risk preferences via portfolio choices
Alsabah, Humoud, (2021)
- More ...
-
Agrawal, Shipra, (2022)
-
A Unified Framework for Dynamic Pari-Mutuel Information Market Design
Agrawal, Shipra, (2009)
-
Equilibrium in prediction markets with buyers and sellers
Agrawal, Shipra, (2010)
- More ...