Regret analysis of a Markov policy gradient algorithm for multiarm bandits
Year of publication: |
2023
|
---|---|
Authors: | Walton, Neil ; Denisov, Denis |
Published in: |
Mathematics of operations research. - Hanover, Md. : INFORMS, ISSN 1526-5471, ZDB-ID 2004273-5. - Vol. 48.2023, 3, p. 1553-1588
|
Subject: | 60J05 | Foster–Lyapunov | Markov chains | multiarm bandit | policy gradient | regret | Theorie | Theory | Entscheidung unter Unsicherheit | Decision under uncertainty | Markov-Kette | Markov chain |
-
Technical uncertainty in real options with learning
Jaimungal, Sebastian, (2018)
-
Essays on barriers to growth, strategic behavior and uncertainty
Livshits, Igor, (2002)
-
Chen, Yu-Fu, (2009)
- More ...
-
Denisov, Denis, (2013)
-
Probabilistic approach to risk processes with level-dependent premium rate
Denisov, Denis, (2024)
-
Closed queueing networks under congestion : nonbottleneck independence and bottleneck convergence
Anselmi, Jonatha, (2013)
- More ...