The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field.
Decision tree induction is a widely used technique for learning from data, which first emerged in the 1980s. In recent years, several authors have noted that in practice, accuracy alone is not adequate, and it has become increasingly important to take into consideration the cost of misclassifying the data. Several authors have developed techniques to induce cost-sensitive decision trees. There are many studies that include pair-wise comparisons of algorithms, but the comparison including many methods has not been conducted in earlier work. This paper aims to remedy this situation by investigating different cost-sensitive decision tree induction algorithms. A survey has identified 30 cost-sensitive decision tree algorithms, which can be organized into 10 categories. A representative sample of these algorithms has been implemented and an empirical evaluation has been carried. In addition, an accuracy-based look-ahead algorithm has been extended to a new cost-sensitive look-ahead algorithm and also evaluated. The main outcome of the evaluation is that an algorithm based on genetic algorithms, known as Inexpensive Classification with Expensive Tests, performed better over all the range of experiments thus showing that to make a decision tree cost-sensitive, it is better to include all the different types of costs, that is, cost of obtaining the data and misclassification costs, in the induction of the decision tree.
fa cil ity and fre quency of their meas urement, this "un pleas ant state" could be spe ciously la belled as anxi ety or dys pho ria. The per son al ity theo ries of Hor ney (18) and Rogers (19) are ger mane. Hor ney's
This paper develops a new algorithm for inducing cost-sensitive decision trees that is inspired by the multi-armed bandit problem, in which a player in a casino has to decide which slot machine (bandit) from a selection of slot machines is likely to pay out the most. Game Theory proposes a solution to this multi-armed bandit problem by using a process of exploration and exploitation in which reward is maximized. This paper utilizes these concepts to develop a new algorithm by viewing the rewards as a reduction in costs, and utilizing the exploration and exploitation techniques so that a compromise between decisions based on accuracy and decisions based on costs can be found. The algorithm employs the notion of lever pulls in the multi-armed bandit game to select the attributes during decision tree induction, using a lookahead methodology to explore potential attributes and exploit the attributes which maximizes the reward. The new algorithm is evaluated on fifteen datasets and compared to six wellknown algorithms J48, EG2, MetaCost, AdaCostM1, ICET and ACT. The results obtained show that the new multi-armed based algorithm can produce more cost-effective trees without compromising accuracy. The paper also includes a critical appraisal of the limitations of the new algorithm and proposes avenues for further research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.