Mehran Sahami scite author profile

Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify de ning characteristics of the methods, and conduct an empirical evaluation of several methods. We compare binning, an unsupervised discretization method, to entropy-based and purity-based methods, which are supervised algorithms. We found that the performance of the Naive-Bayes algorithm signi cantly improved when features were discretized using an entropy-based method. In fact, over the 16 tested datasets, the discretized version of Naive-Bayes slightly outperformed C4.5 on average. We also show that in some cases, the performance of the C4.5 induction algorithm signi cantly improved if features were discretized in advance; in our experiments, the performance never signi cantly degraded, an interesting phenomenon considering the fact that C4.5 is capable of locally discretizing features.

show abstract

Inductive learning algorithms and representations for text categorization

Dumais

et al. 1998

View full text Add to dashboard Cite

Text categorization -the assignment of natural language texts to one or more predefined categories based on their content -is an important component in many information organization and management tasks. We compare the effectiveness of five different automatic learning algorithms for text categorization in terms of learning speed, realtime classification speed, and classification accuracy. We also examine training set size, and alternative document representations. Very accurate text classifiers can be learned automatically from training examples. Linear Support Vector Machines (SVMs) are particularly promising because they are very accurate, quick to train, and quick to evaluate.

show abstract

A web-based kernel function for measuring the similarity of short text snippets

2006

View full text Add to dashboard Cite

Autonomously Generating Hints by Inferring Problem Solving Policies

et al. 2015

View full text Add to dashboard Cite

Exploring the whole sequence of steps a student takes to produce work, and the patterns that emerge from thousands of such sequences is fertile ground for a richer understanding of learning. In this paper we autonomously generate hints for the Code.org 'Hour of Code,' (which is to the best of our knowledge the largest online course to date) using historical student data. We first develop a family of algorithms that can predict the way an expert teacher would encourage a student to make forward progress. Such predictions can form the basis for effective hint generation systems. The algorithms are more accurate than current state-of-the-art methods at recreating expert suggestions, are easy to implement and scale well. We then show that the same framework which motivated the hint generating algorithms suggests a sequence-based statistic that can be measured for each learner. We discover that this statistic is highly predictive of a student's future success.

show abstract

Programming Pluralism: Using Learning Analytics to Detect Patterns in the Learning of Computer Programming

Blikstein

Worsley²,

Piech

et al. 2014

Journal of the Learning Sciences

233

102

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mehran Sahami

Supervised and Unsupervised Discretization of Continuous Features

Inductive learning algorithms and representations for text categorization

A web-based kernel function for measuring the similarity of short text snippets

Autonomously Generating Hints by Inferring Problem Solving Policies

Programming Pluralism: Using Learning Analytics to Detect Patterns in the Learning of Computer Programming

Contact Info

Product

Resources

About