On boundedness of Q-learning iterates for stochastic shortest path problems
Year of publication: |
2013
|
---|---|
Authors: | Yu, Huizhen ; Bertsekas, Dimitri P. |
Published in: |
Mathematics of operations research. - Catonsville, MD : INFORMS, ISSN 0364-765X, ZDB-ID 195683-8. - Vol. 38.2013, 2, p. 209-227
|
Subject: | Markov decision processes | Q-learning | stochastic approximation | dynamic programming | reinforcement learning | Theorie | Theory | Stochastischer Prozess | Stochastic process | Markov-Kette | Markov chain | Lernprozess | Learning process | Mathematische Optimierung | Mathematical programming | Dynamische Optimierung | Dynamic programming |
-
Condition-based production for stochastically deteriorating systems : optimal policies and learning
Drent, Collin, (2024)
-
Envelope theorems for multistage linear stochastic optimization
Terça, Gonçalo, (2021)
-
Dynamic multi-priority, multi-class patient scheduling with stochastic service times
Sauré, Antoine, (2020)
- More ...
-
On near optimality of the set of finite-state controllers for average cost POMDP
Yu, Huizhen, (2008)
-
Error bounds for approximations from projected linear equations
Yu, Huizhen, (2010)
-
Q-learning and enhanced policy iteration in discounted dynamic programming
Bertsekas, Dimitri P., (2012)
- More ...