Optimistic posterior sampling for reinforcement learning : worst-case regret bounds
Year of publication: |
2023
|
---|---|
Authors: | Agrawal, Shipra ; Jia, Randy |
Published in: |
Mathematics of operations research. - Hanover, Md. : INFORMS, ISSN 1526-5471, ZDB-ID 2004273-5. - Vol. 48.2023, 1, p. 363-392
|
Subject: | Markov decision process | regret bounds | reinforcement learning | Thompson sampling | Stichprobenerhebung | Sampling | Markov-Kette | Markov chain | Entscheidung | Decision | Lernprozess | Learning process | Lernen | Learning | Entscheidung unter Unsicherheit | Decision under uncertainty | Begrenzte Rationalität | Bounded rationality | Entscheidungstheorie | Decision theory |
-
Choosing a good toolkit, I : prior-free heuristics
Francetich, Alejandro, (2020)
-
Poisoning finite-horizon Markov decision processes at design time
Caballero, William N., (2021)
-
Small-loss bounds for online learning with partial information
Lykouris, Thodoris, (2022)
- More ...
-
Agrawal, Shipra, (2022)
-
Parimutuel betting on permutations
Agrawal, Shipra, (2008)
-
Equilibrium in prediction markets with buyers and sellers
Agrawal, Shipra, (2010)
- More ...