Multi-interval discretization methods for decision tree learning

Perner, Petra; Trautzsch, Sascha

doi:10.1007/bfb0033269

Cited by 31 publications

(11 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Based on that data set we acquired the knowledge for classification. We used a binary [5] and n-ary decision tree induction algorithm [6] realized in our data mining tool DECISIONMASTER [7]. The n-ary decision tree can split up a numerical feature into more than two intervals which leads sometimes to a better performance than the one of a binary decision tree.…”

Section: Learning Of Classifier Knowledgementioning

confidence: 99%

Classification of HEp-2 Cells Using Fluorescent Image Analysis and Data Mining

Perner

2001

Medical Data Analysis

Self Cite

View full text Add to dashboard Cite

Abstract. The cells that are considered in this application for an automated image analysis are Hep-2 cells which are used for the identification of antinuclear autoantibodies (ANA). Hep-2 cells allow for recognition of over 30 different nuclear and cytoplasmic patterns, which are given by upwards of 100 different autoantibodies. The identification of the patterns has recently been done manually by a human inspecting the slides with a microscope. In this paper we present results on image analysis, feature extraction, and classification. Starting from a knowledge acquisition process with a human operator, we developed an image analysis and feature extraction algorithm. A data set containing 162 features for each entry was set up and given to a data mining algorithm to find out the relevant features among this large feature set and to construct the classification knowledge. The classifier was evaluated by cross validation. The results show the feasibility of an automated inspection system.

show abstract

Section: Learning Of Classifier Knowledgementioning

confidence: 99%

Classification of HEp-2 Cells Using Fluorescent Image Analysis and Data Mining

Perner

2001

Medical Data Analysis

Self Cite

View full text Add to dashboard Cite

show abstract

“…Although EMD has demonstrated strong performance for naive-Bayes (Dougherty et al 1995;Perner and Trautzsch 1998), it was developed in the context of top-down induction of decision trees. It uses MDL as the termination condition.…”

Section: Entropy Minimization Discretizationmentioning

confidence: 99%

Discretization for naive-Bayes learning: managing discretization bias and variance

2008

View full text Add to dashboard Cite

Quantitative attributes are usually discretized in Naive-Bayes learning. We establish simple conditions under which discretization is equivalent to use of the true probability density function during naive-Bayes learning. The use of different discretization techniques can be expected to affect the classification bias and variance of generated naive-Bayes classifiers, effects we name discretization bias and variance. We argue that by properly managing discretization bias and variance, we can effectively reduce naive-Bayes classification error. In particular, we supply insights into managing discretization bias and variance by adjusting the number of intervals and the number of training instances contained in each interval. We accordingly propose proportional discretization and fixed frequency discretization, two efficient unsupervised discretization methods that are able to effectively manage discretization bias and variance. We evaluate our new techniques against four key discretization methods for naive-Bayes classifiers. The experimental results support our theoretical analyses by showing that with statistically significant frequency, naive-Bayes classifiers trained on data discretized by our new methods are able to achieve lower classification error than those trained on data discretized by current established discretization methods.

show abstract

“…Discretization of continuous attributes is fundamental to many decision tree algorithms and is therefore a well researched area in data mining [26]. Many decision tree algorithms such as ID3, C4.5 and CART all require binary splits at decision nodes [27]. The value at which this split occurs is usually determined by a discretization algorithm, although the difference here is that this discretization will occur in a dynamic manner as the tree is built, rather than occurring as a pre-processing step as occurs in nearest neighbour algorithms.…”

Section: B Attribute Discretizationmentioning

confidence: 99%

“…For feature A, the boundary T min that minimises the entropy over all possible boundaries is selected [27]. The application of this will therefore result in a binary split, and the method can be applied recursively until a stopping criterion is met, in this case, a criterion based on the Minimum Description Length Principle.…”

Section: B Attribute Discretizationmentioning

confidence: 99%

WAIRS: improving classification accuracy by weighting attributes in the AIRS classifier

Seeker

Freitas

2007

2007 IEEE Congress on Evolutionary Computation

View full text Add to dashboard Cite

Abstract-AIRS (Artificial Immune Recognition System) has shown itself to be a competitive classifier. It has also proved to be the most popular immune inspired classifier. However, rather than AIRS being a classifier in its own right as previously described, we see AIRS more as a pre-processor to a KNN classifier. It is our view that by not explicitly classing it as such development of this algorithm has been rather held back. Seeing it as a pre-processor allows inspiration to be taken from the machine learning literature where such pre-processors are not uncommon. With this in mind, this paper takes a core feature of many such pre-processors, that of attribute weighting, and applies it to AIRS. The resultant algorithm called WAIRS (Weighted AIRS) uses a weighted distance function during all affinity evaluations. WAIRS is tested on 9 benchmark datasets and is found to outperform AIRS in the majority of cases.

show abstract

Multi-interval discretization methods for decision tree learning

Cited by 31 publications

References 7 publications

Classification of HEp-2 Cells Using Fluorescent Image Analysis and Data Mining

Classification of HEp-2 Cells Using Fluorescent Image Analysis and Data Mining

Discretization for naive-Bayes learning: managing discretization bias and variance

WAIRS: improving classification accuracy by weighting attributes in the AIRS classifier

Contact Info

Product

Resources

About