Ordinal Dynamic Programming

Numerically valued reward processes are found in most dynamic programming models. Mitten, however, recently formulated finite horizon sequential decision processes in which a real-valued reward need not be earned at each stage. Instead of the cardinality assumption implicit in past models, Mitten assumes that a decision maker has a preference order over a general collection of outcomes (which need not be numerically valued). This paper investigates infinite horizon ordinal dynamic programming models. Both deterministic and stochastic models are considered. It is shown that an optimal policy exists if and only if some stationary policy is optimal. Moreover, "policy improvement" leads to better policies using either Howard-Blackwell or Eaton-Zadeh procedures. The results illuminate the roles played by various sets of assumptions in the literature on Markovian decision processes.

MoreLess

Year of publication:	1975
Authors:	Sobel, Matthew J.
Published in:	Management Science. - Institute for Operations Research and the Management Sciences - INFORMS, ISSN 0025-1909. - Vol. 21.1975, 9, p. 967-975
Publisher:	Institute for Operations Research and the Management Sciences - INFORMS

More details

Extent:	application/pdf
Type of publication:	Article
Other identifiers:	10.1287/mnsc.21.9.967 [DOI]
Source:	RePEc - Research Papers in Economics

Persistent link: https://www.econbiz.de/10009218437