SummaryIn representing classification rules by decision trees, simplicity of tree structure is as important as predictive accuracy especially in consideration of the comprehensibility to a human, the memory capacity and the time required to classify. Trees tend to be complex when they get high accuracy. This paper proposes a novel method for generating accurate and simple decision trees based on symbiotic evolution. It is distinctive of symbiotic evolution that two different populations are evolved in parallel through genetic algorithms. In our method one's individuals are partial trees of height 1, and the other's individuals are whole trees represented by the combinations of the former individuals. Generally, overfitting to training examples prevents getting high predictive accuracy. In order to circumvent this difficulty, individuals are evaluated with not only the accuracy in training examples but also the correct answer biased rate indicating the dispersion of the correct answers in the terminal nodes. Based on our method we developed a system called SESAT for generating decision trees. Our experimental results show that SESAT compares favorably with other systems on several datasets in the UCI repository. SESAT has the ability to generate more simple trees than C5.0 without sacrificing predictive accuracy.
Microsimulation models of land use characterize attributes of a household and its location, referred to as microdata in this study. However, methods for evaluating the goodness of fit between estimated and observed sets of agent-based microdata have not been investigated extensively. Although the attributes of a household include various items, such as the relationship with the household head and ages of the members, housing type and spatial location, number of cars owned, and income, the attributes can be classified into general categories. The objective of the present study is to develop a goodness-of-fit evaluation method for agent-based household microdata sets composed of generalized attributes. First, a distance measure between the estimated and observed microdata for each household is defined. In this definition a generalized scheme is introduced, whereby attributes are structured by the household composition, attributes of the member, and attributes of the household as a whole. The goodness of fit is measured on the basis of the minimum sum of distances for all households in the study area. The calculation cannot be carried out with just a conventional algorithm for microdata of a typical size because the number of calculations increases in proportion to the factorial (N!) of the number (N) of agents. Therefore, a genetic algorithm, especially one using symbiotic evolution, is developed to solve the problem. The effectiveness of the method in regard to accuracy and calculation feasibility is confirmed by using person trip survey data for the Sapporo metropolitan area in Japan.
SummaryThis paper proposes a novel method for generating a decision tree to discriminate polymers accurately with the near-infrared rays spectrum. The polymer discrimination system is needed for recycling plastics, and the near-infrared rays spectrum is useful for rapid and non-destructive discrimination. The former system SESAT, which is based on symbiotic evolution, can generate simple and accurate trees, but is not effective for data that has a lot of attributes like the near-infrared rays spectrum. We design the structure of the partial solution "sprig" for sufficient learning, and the fitness function of the whole solution "decision tree blueprint" for 2-class discrimination. In addition, we introduce two-step discrimination with the aim of obtaining higher accuracy. In the first step, examples are divided into two groups, one group being easier than the other to discriminate by a tree. In the second step, two trees are generated that discriminate one kind of polymer from the others, for two groups of examples. By doing this, a minority of examples is also discriminated accurately. Based on this method we developed a polymer discrimination system called TS-SEPT. Our experimental results on real data of polymers show that the accuracy of TS-SEPT compares favorably with that of the other systems, the similar system without two-step discrimination, SESAT and C5.0. It emerged that both the method for generating decision trees and two-step discrimination contributed to the improved accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.