Showing 1 - 10 of 43
What is the most statistically efficient way to do off-policy optimization with batch data from bandit feedback? For log data generated by contextual bandit algorithms, we consider offline estimators for the expected reward from a counterfactual policy. Our estimators are shown to have the...
Persistent link: https://www.econbiz.de/10012906605
Persistent link: https://www.econbiz.de/10014394217
Persistent link: https://www.econbiz.de/10012515858
Persistent link: https://www.econbiz.de/10014435186
Persistent link: https://www.econbiz.de/10013387729
Persistent link: https://www.econbiz.de/10013393675
Persistent link: https://www.econbiz.de/10014539002
We examine how to learn personalized customer retention strategies when customers' intentions to purchase evolve over time. Working with a Japanese online platform, we first implement a large-scale randomized experiment, in which coupons are randomly sent to first-time buyers at different times....
Persistent link: https://www.econbiz.de/10014235545
Persistent link: https://www.econbiz.de/10012134518
We study the effect of different school choice mechanisms on schools' incentives for quality improvement. To do so, we introduce the following criterion: A mechanism respects improvements of school quality if each school becomes weakly better off whenever that school becomes more preferred by...
Persistent link: https://www.econbiz.de/10009353445