Polynomial-time computation of strong and n-present-value optimal policies in Markov decision chains
Year of publication: |
August 2017
|
---|---|
Authors: | O'Sullivan, Michael ; Veinott, Arthur F. <Jr.> |
Published in: |
Mathematics of operations research. - Catonsville, MD : INFORMS, ISSN 0364-765X, ZDB-ID 195683-8. - Vol. 42.2017, 3, p. 577-598
|
Subject: | dynamic programming | computational complexity | infinite horizon | Theorie | Theory | Dynamische Optimierung | Dynamic programming | Markov-Kette | Markov chain | Mathematische Optimierung | Mathematical programming |
-
Finite-memory strategies in POMDPs with long-run average objectives
Chatterjee, Krishnendu, (2022)
-
On boundedness of Q-learning iterates for stochastic shortest path problems
Yu, Huizhen, (2013)
-
Robust modified policy iteration
Kaufman, David L., (2013)
- More ...
-
Constrained Markov decision chains
Derman, Cyrus, (1972)
-
Maximum-stopping-value policies in finite markov population decision chains
Eaves, Burchett Curtis, (2014)
-
Multicommodity production planning : qualitative analysis and applications
Ciurria-Infosino, Iara, (2015)
- More ...