Inequality in Utility of Data and Its Implications for Data Management
Abstract: The volumes and complexity of organizational data-resources are rapidly increasing and hence demand larger investments in data-management solutions. We posit that justifying these investments needs a better understanding of the utility of data - the business value gained by using it. As a first step in this direction, we explore utility inequality - the extent to which records in a large dataset differ in their business-value contribution. The distribution of utility and the magnitude of utility inequality in a data resource have important implications for data management. These can impact design and administration of the data resource, inform data acquisition and retention policies, and assist in prioritizing quality improvement efforts. We propose analytical tools for modeling and quantifying inequality, and demonstrate their application for assessing inequality in a large alumni data repository. We also examine the implications of utility inequality for data-management decisions associated with this repository