For classification problems with ordinal attributes very often the class attribute should increase with each or some of the explaining attributes. These are called classification problems with monotonicity constraints. Classical decision tree algorithms such as CART or C4.5 generally do not produce monotone trees, even if the dataset is completely monotone. This paper surveys the methods that have so far been proposed for generating decision trees that satisfy monotonicity constraints. A distinction is made between methods that work only for monotone datasets and methods that work for monotone and non-monotone datasets alike.
We create a support system for predicting end prices on eBay. The end price predictions are based on the item descriptions found in the item listings of eBay, and on some numerical item features. The system uses text mining and boosting algorithms from the field of machine learning. Our system substantially outperforms the naive method of predicting the category mean price. Moreover, interpretation of the model enables us to identify influential terms in the item descriptions and shows that the item description is more influential than the seller feedback rating, which was shown to be influential in earlier studies.
In this paper various ensemble learning methods from machine learning and statistics are considered and applied to the customer choice modeling problem. The application of ensemble learning usually improves the prediction quality of flexible models like decision trees and thus leads to improved predictions. We give experimental results for two real-life marketing datasets using decision trees, ensemble versions of decision trees and the logistic regression model, which is a standard approach for this problem. The ensemble models are found to improve upon individual decision trees and outperform logistic regression.Next, an additive decomposition of the prediction error of a model is considered known as the bias/variance decomposition. A model with a high bias lacks the flexibility to fit the data well. A high variance indicates that a model is instable with respect to different datasets. Decision trees have a high variance component and a low bias component in the prediction error, whereas logistic regression has a high bias component and a low variance component. It is shown that ensemble methods aim at minimizing the variance component in the prediction error while leaving the bias component unaltered. Bias/variance decompositions for all models for both customer choice datasets are given to illustrate these concepts.
This paper proposes a new proof of Dilworth's theorem. The proof is based upon the minflow/maxcut property in flow networks. In relation to this proof, a new method to find both a Dilworth decomposition and a maximal antichain is presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.