Optimistic posterior sampling for reinforcement learning : worst-case regret bounds

Shipra Agrawal, Randy Jia

Year of publication:	2023
Authors:	Agrawal, Shipra ; Jia, Randy
Published in:	Mathematics of operations research. - Hanover, Md. : INFORMS, ISSN 1526-5471, ZDB-ID 2004273-5. - Vol. 48.2023, 1, p. 363-392
Subject:	Markov decision process \| regret bounds \| reinforcement learning \| Thompson sampling \| Stichprobenerhebung \| Sampling \| Markov-Kette \| Markov chain \| Entscheidung \| Decision \| Lernprozess \| Learning process \| Lernen \| Learning \| Entscheidung unter Unsicherheit \| Decision under uncertainty \| Begrenzte Rationalität \| Bounded rationality \| Entscheidungstheorie \| Decision theory

Type of publication:	Article
Type of publication (narrower categories):	Aufsatz in Zeitschrift ; Article in journal
Language:	English
Other identifiers:	10.1287/moor.2022.1266 [DOI]
Source:	ECONIS - Online Catalogue of the ZBW

Persistent link: https://www.econbiz.de/10014312555