Showing 1 - 10 of 21
This paper surveys the relevant existing literature that can help researchers and policy makers understand the drivers of competition in markets that constitute the provision of artificial intelligence products. The focus is on three broad markets: training data, input data, and AI predictions....
Persistent link: https://www.econbiz.de/10014512124
Identifying near duplicates within large, noisy text corpora has a myriad of applications that range from de-duplicating training datasets, reducing privacy risk, and evaluating test set leakage, to identifying reproduced news articles and literature within large corpora. Across these diverse...
Persistent link: https://www.econbiz.de/10013477218
The last 40 years have seen huge innovations in computing technology and data availability. Data derived from millions of administrative records or by using (as we do) new methods of data generation such as text mining are now common. New data often requires new methods, which in turn can...
Persistent link: https://www.econbiz.de/10012479239
We study a model where firms accumulate data as a valuable intangible asset. Data accumulation affects firms' dynamics. It increases the skewness of the firm size distribution as large firms generate more data and invest more in active experimentation. On the other hand, small data- savvy firms...
Persistent link: https://www.econbiz.de/10012479471
Policymakers can take actions to prevent local conflict before it begins, if such violence can be accurately predicted. We examine the two countries with the richest available sub-national data: Colombia and Indonesia. We assemble two decades of fine-grained violence data by type, alongside...
Persistent link: https://www.econbiz.de/10012479929
A key challenge for research on many questions in the social sciences is that it is difficult to link historical records in a way that allows investigators to observe people at different points in their life or across generations. In this paper, we develop a new approach that relies on millions...
Persistent link: https://www.econbiz.de/10012480171
We document the degree of price dispersion and the similarities as well as differences in pricing and promotion strategies across stores in the U.S. retail (grocery) industry. Our analysis is based on "big data" that allow us to draw general conclusions based on the prices for close to 50,000...
Persistent link: https://www.econbiz.de/10012480251
Text data is ultra-high dimensional, which makes machine learning techniques indispensable for textual analysis. Text is often selected--journalists, speechwriters, and others craft messages to target their audiences' limited attention. We develop an economically motivated high dimensional...
Persistent link: https://www.econbiz.de/10012480461
Modern investors face a high-dimensional prediction problem: thousands of observable variables are potentially relevant for forecasting. We reassess the conventional wisdom on market efficiency in light of this fact. In our model economy, which resembles a typical machine learning setting, N...
Persistent link: https://www.econbiz.de/10012480530
This paper combines a data rich environment with a machine learning algorithm to provide new estimates of time-varying systematic expectational errors ("belief distortions") embedded in survey responses. We find that distortions are large on average even for professional forecasters, with all...
Persistent link: https://www.econbiz.de/10012481601