Similar Search Results

Bernstein, Abraham - 2009

This paper addresses the classification of linked entities. Weintroduce a relational vector (VS) model (in analogy to theVS model used in information retrieval) that abstracts the linkedstructure, representing entities by vectors of weights. Givenlabeled data as background knowledge training...

Persistent link: https://www.econbiz.de/10012766083

Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers

Sheng, Victor - 2009

This paper addresses the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of...

Persistent link: https://www.econbiz.de/10012766133

Social Network Collaborative Filtering

Zheng, Rong - 2008

This paper demonstrates that quot;social network collaborative filteringquot; (SNCF), wherein user-selected like-minded alters are used to make predictions, can rival traditional user-to-user collaborative filtering (CF) in predictive accuracy. Us-ing a unique data set from an online community...

Persistent link: https://www.econbiz.de/10012768374

Tree Induction Vs Logistic Regression : A Learning Curve Analysis

Perlich, Claudia - 2008

Persistent link: https://www.econbiz.de/10012769152

Discovering Interesting Patterns for Investment Decision Making with Glower C - a Genetic Learner Overlaid with Entropy Reduction

Dhar, Vasant - 2008

Prediction in financial domains is notoriously difficult for a number of reasons. First, theories tend to beweak or non-existent, which makes problem formulation open-ended by forcing us to consider a largenumber of independent variables and thereby increasing the dimensionality of the search...

Persistent link: https://www.econbiz.de/10012769780

Variance-Based Active Learning

Saar-Tsechansky, Maytal - 2008

For many supervised learning tasks, the cost of acquiringtraining data is dominated by the cost of class labeling.In this work, we explore active learning forclass probability estimation (CPE). Active learning acquiresdata incrementally, using the model learned sofar to help identify especially...

Persistent link: https://www.econbiz.de/10012769782

Tree Induction Vs. Logistic Regression : a Learning-Curve Analysis

Perlich, Claudia - 2008

Tree induction and logistic regression are two standard, off-the-shelf methodsfor building models for classification. We present a large-scale experimentalcomparison of logistic regression and tree induction, assessing classification accuracyand the quality of rankings based on class-membership...

Persistent link: https://www.econbiz.de/10012769783

Confidence Bands for Roc Curves

Macskassy, Sofus - 2008

We address the problem of comparing the performance of classifiers. In this paper we study techniques for generating and evaluating bands on ROC curves. Historically this has been done using one-dimensional confidence intervals by freezing one variable - false-positiverate, or threshold on the...

Persistent link: https://www.econbiz.de/10012769786

Classification in Networked Data : a Toolkit and a Univariate Case Study

Macskassy, Sofus - 2008

This paper presents NetKit, a modular toolkit for classification in networked data, and a case-studyof its application to a collection of networked data sets used in prior machine learning research.Networked data are relational data where entities are interconnected, and this paper considers...

Persistent link: https://www.econbiz.de/10012769932

Roc Confidence Bands : An Empirical Study

Mcskassy, Sofus - 2008

This paper is about constructing confidence bands around an ROCcurve such that (1 - \delta)% of the ROC curves traced by data setsof size r will fall completely within the bands. We introduce tothe machine learning community three methods from the medicalfield that are applicable to generate...

Persistent link: https://www.econbiz.de/10012769934