P-Prism: A Computationally Efficient Approach to Scaling up Classification Rule Induction

Stahl, Frederic; Bramer, Max; Adda, Mo

doi:10.1007/978-0-387-09695-7_8

Cited by 6 publications

(4 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another version namely N-PRISM algorithm (Bramer, 2000) is proposed to resolve the problem of noisy data, whereas J-Pruning (Bramer, 2002) employs pre-pruning strategy. In 2008, Stahl and Barmer introduced Parallel PRISM (P-PRISM) (Stahl and Bramer, 2008) method to overcome PRISM’s excessive computational process of testing the entire population of data attribute inside the training data set.…”

Section: Background Of the Present Researchmentioning

confidence: 99%

An e-healthcare system for disease prediction using hybrid data mining technique

Sarkar

Sana

2019

JM2

View full text Add to dashboard Cite

Purpose The purpose of this study is to alleviate the specified issues to a great extent. To promote patients’ health via early prediction of diseases, knowledge extraction using data mining approaches shows an integral part of e-health system. However, medical databases are highly imbalanced, voluminous, conflicting and complex in nature, and these can lead to erroneous diagnosis of diseases (i.e. detecting class-values of diseases). In literature, numerous standard disease decision support system (DDSS) have been proposed, but most of them are disease specific. Also, they usually suffer from several drawbacks like lack of understandability, incapability of operating rare cases, inefficiency in making quick and correct decision, etc. Design/methodology/approach Addressing the limitations of the existing systems, the present research introduces a two-step framework for designing a DDSS, in which the first step (data-level optimization) deals in identifying an optimal data-partition (Popt) for each disease data set and then the best training set for Popt in parallel manner. On the other hand, the second step explores a generic predictive model (integrating C4.5 and PRISM learners) over the discovered information for effective diagnosis of disease. The designed model is a generic one (i.e. not disease specific). Findings The empirical results (in terms of top three measures, namely, accuracy, true positive rate and false positive rate) obtained over 14 benchmark medical data sets (collected from https://archive.ics.uci.edu/ml) demonstrate that the hybrid model outperforms the base learners in almost all cases for initial diagnosis of the diseases. After all, the proposed DDSS may work as an e-doctor to detect diseases. Originality/value The model designed in this study is original, and the necessary parallelized methods are implemented in C on Cluster HPC machine (FUJITSU) with total 256 cores (under one Master node).

show abstract

Section: Background Of the Present Researchmentioning

confidence: 99%

An e-healthcare system for disease prediction using hybrid data mining technique

Sarkar

Sana

2019

JM2

View full text Add to dashboard Cite

show abstract

“…The authors developed a strategy for parallel RI called Parallel Modular Classification Rule Induction (PMCRI). This strategy is a continuation of an early work by the same authors in 2008 which resulted in parallel PRISM (P-PRISM) (Stahl and Bramer, 2008). P-PRISM algorithm was disseminated to overcome PRISM's excessive computational process of testing the entire population of data attribute inside the training dataset.…”

Section: Prism Consmentioning

confidence: 99%

“…(Stahl and Bramer, 2008) (Elgibreen and Aksoy, 2013) (Stahl and Bramer, 2014). This algorithm employs separate-and-conquer strategy in knowledge discovery in which PRISM generates rules according to the class labels in the training dataset.…”

Section: Introductionmentioning

confidence: 99%

Constrained dynamic rule induction learning

Thabtah

Qabajeh

Chiclana

2016

Expert Systems with Applications

View full text Add to dashboard Cite

“…The final concept description in the case of classification rule induction would be a set of classification rules. We developed a parallel modular classification rule induction framework for the Prism family and tested it on PrismTCS, the PMCRI (Parallel Modular Classification Rule Induction) framework [21], which applies to the CDM model. Parallelisation in the first layer is achieved by distributing all attribute lists evenly over n processors and processing them locally by algorithms L 1 to L n , which induce rule terms.…”

Section: Pmcri: a Parallel Modular Classification Rule Induction Frammentioning

confidence: 99%

PMCRI: A Parallel Modular Classification Rule Induction Framework

Stahl

Bramer

Adda

2009

Machine Learning and Data Mining in Pattern Recognition

Self Cite

View full text Add to dashboard Cite

In a world where massive amounts of data are recorded on a large scale we need data mining technologies to gain knowledge from the data in a reasonable time. The Top Down Induction of Decision Trees (TDIDT) algorithm is a very widely used technology to predict the classification of newly recorded data. However alternative technologies have been derived that often produce better rules but do not scale well on large datasets. Such an alternative to TDIDT is the PrismTCS algorithm. PrismTCS performs particularly well on noisy data but does not scale well on large datasets. In this paper we introduce Prism and investigate its scaling behaviour. We describe how we improved the scalability of the serial version of Prism and investigate its limitations. We then describe our work to overcome these limitations by developing a framework to parallelise algorithms of the Prism family and similar algorithms. We also present the scale up results of a first prototype implementation.

show abstract

P-Prism: A Computationally Efficient Approach to Scaling up Classification Rule Induction

Cited by 6 publications

References 8 publications

An e-healthcare system for disease prediction using hybrid data mining technique

An e-healthcare system for disease prediction using hybrid data mining technique

Constrained dynamic rule induction learning

PMCRI: A Parallel Modular Classification Rule Induction Framework

Contact Info

Product

Resources

About