Clayton Kunz scite author profile

Clayton Kunz

1Publication

78Citation Statements Received

0Citation Statements Given

How they've been cited

135

How they cite others

Affiliations

Publications

Order By: Most citations

Pruning decision trees with misclassification costs

Bradford

Kunz²,

Kohavi³

et al. 1998

135

View full text Add to dashboard Cite

A b s t r a c t . We describe an experimental study of pruning methods for decision tree classifiers when the goal is minimizing l o s s rather than e rror. In addition to two common methods for error minimization, CART's cost-complexity pruning and C4.5's error-based pruning, we study the extension of cost-complexity pruning to loss and one pruning variant based on the Laplace correction. We perform an empirical comparison of these methods and evaluate them with respect to loss. We found that applying the Laplace correction to estimate the probability distributions at the leaves was beneficial to all pruning methods. Unlike in error minimization, and somewhat surprisingly, performing no pruning led to results that were on par with other methods in terms of the evaluation criteria. The main advantage of pruning was in the reduction of the decision tree size, sometimes by a factor of ten. While no method dominated others on all datasets, even for the same domain different pruning mechanisms are better for different loss matrices. 1 P r u n i n g D e c i s i o n T r e e s Decision trees are a widely used symbolic modeling technique for classification tasks in machine learning. The most common approach to constructing decision tree classifiers is to grow a full tree and prune it back. Pruning is desirable because the tree that is grown may overfit the data by inferring more structure than is justified by the training set. Specifically, if there are no conflicting instances, the training set error of a fully built tree is zero, while the true error is likely to be larger. To combat this overfitting problem, the tree is pruned back with the goal of identifying the tree with the lowest error rate on previously unobserved instances, breaking ties in favor of smaller trees (Breiman, Friedman, Olshen ~c Stone 1984, Quinlan 1993. Several pruning methods have been introduced in the literature, including cost-complexity pruning, reduced error pruning, pessimistic pruning, error-based pruning, penalty pruning, and MDL pruning. Historically, most pruning algorithms have been developed to minimize the expected e r r o r r a t e of the decision tree, assuming that classification errors have the same unit cost.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Clayton Kunz

Pruning decision trees with misclassification costs

Contact Info

Product

Resources

About