Showing 51 - 60 of 93
This paper addresses the classification of linked entities. Weintroduce a relational vector (VS) model (in analogy to theVS model used in information retrieval) that abstracts the linkedstructure, representing entities by vectors of weights. Givenlabeled data as background knowledge training...
Persistent link: https://www.econbiz.de/10012766083
This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of...
Persistent link: https://www.econbiz.de/10012766133
This paper demonstrates that quot;social network collaborative filteringquot; (SNCF), wherein user-selected like-minded alters are used to make predictions, can rival traditional user-to-user collaborative filtering (CF) in predictive accuracy. Us-ing a unique data set from an online community...
Persistent link: https://www.econbiz.de/10012768374
Persistent link: https://www.econbiz.de/10012769152
Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to beweak or non-existent, which makes problem formulation open-ended by forcing us to consider a largenumber of independent variables and thereby increasing the dimensionality of the search...
Persistent link: https://www.econbiz.de/10012769780
For many supervised learning tasks, the cost of acquiringtraining data is dominated by the cost of class labeling.In this work, we explore active learning forclass probability estimation (CPE). Active learning acquiresdata incrementally, using the model learned sofar to help identify especially...
Persistent link: https://www.econbiz.de/10012769782
Tree induction and logistic regression are two standard, off-the-shelf methodsfor building models for classification. We present a large-scale experimentalcomparison of logistic regression and tree induction, assessing classification accuracyand the quality of rankings based on class-membership...
Persistent link: https://www.econbiz.de/10012769783
We address the problem of comparing the performance of classifiers. In this paper we study techniques for generating and evaluating bands on ROC curves. Historically this has been done using one-dimensional confidence intervals by freezing one variable - false-positiverate, or threshold on the...
Persistent link: https://www.econbiz.de/10012769786
This paper presents NetKit, a modular toolkit for classification in networked data, and a case-studyof its application to a collection of networked data sets used in prior machine learning research.Networked data are relational data where entities are interconnected, and this paper considers...
Persistent link: https://www.econbiz.de/10012769932
This paper is about constructing confidence bands around an ROCcurve such that (1 - \delta)% of the ROC curves traced by data setsof size r will fall completely within the bands. We introduce tothe machine learning community three methods from the medicalfield that are applicable to generate...
Persistent link: https://www.econbiz.de/10012769934