Regret analysis of a Markov policy gradient algorithm for multiarm bandits

Neil Walton, Denis Denisov

Year of publication:	2023
Authors:	Walton, Neil ; Denisov, Denis
Published in:	Mathematics of operations research. - Hanover, Md. : INFORMS, ISSN 1526-5471, ZDB-ID 2004273-5. - Vol. 48.2023, 3, p. 1553-1588
Subject:	60J05 \| Foster–Lyapunov \| Markov chains \| multiarm bandit \| policy gradient \| regret \| Theorie \| Theory \| Entscheidung unter Unsicherheit \| Decision under uncertainty \| Markov-Kette \| Markov chain

Online Resource

Check full text access |

More access options

doi.org

Check Google Scholar

In libraries world-wide (WorldCat)

In German libraries (KVK)

Type of publication:	Article
Type of publication (narrower categories):	Aufsatz in Zeitschrift ; Article in journal
Language:	English
Other identifiers:	10.1287/moor.2022.1311 [DOI]
Source:	ECONIS - Online Catalogue of the ZBW

Persistent link: https://www.econbiz.de/10014329345

A service of the