Showing 1 - 10 of 56
What is the most statistically efficient way to do off-policy optimization with batch data from bandit feedback? For log data generated by contextual bandit algorithms, we consider offline estimators for the expected reward from a counterfactual policy. Our estimators are shown to have the...
Persistent link: https://www.econbiz.de/10012906605
What is the most statistically efficient way to do off-policy optimization with batch data from bandit feedback? For log data generated by contextual bandit algorithms, we consider offline estimators for the expected reward from a counterfactual policy. Our estimators are shown to have lowest...
Persistent link: https://www.econbiz.de/10012907150
Persistent link: https://www.econbiz.de/10011948939
Persistent link: https://www.econbiz.de/10014539002
Persistent link: https://www.econbiz.de/10013387729
Persistent link: https://www.econbiz.de/10014435186
Persistent link: https://www.econbiz.de/10014394217
Persistent link: https://www.econbiz.de/10013393675
We examine how to learn personalized customer retention strategies when customers' intentions to purchase evolve over time. Working with a Japanese online platform, we first implement a large-scale randomized experiment, in which coupons are randomly sent to first-time buyers at different times....
Persistent link: https://www.econbiz.de/10014235545
Persistent link: https://www.econbiz.de/10012625086