Purpose
The purpose of this study is to alleviate the specified issues to a great extent. To promote patients’ health via early prediction of diseases, knowledge extraction using data mining approaches shows an integral part of e-health system. However, medical databases are highly imbalanced, voluminous, conflicting and complex in nature, and these can lead to erroneous diagnosis of diseases (i.e. detecting class-values of diseases). In literature, numerous standard disease decision support system (DDSS) have been proposed, but most of them are disease specific. Also, they usually suffer from several drawbacks like lack of understandability, incapability of operating rare cases, inefficiency in making quick and correct decision, etc.
Design/methodology/approach
Addressing the limitations of the existing systems, the present research introduces a two-step framework for designing a DDSS, in which the first step (data-level optimization) deals in identifying an optimal data-partition (Popt) for each disease data set and then the best training set for Popt in parallel manner. On the other hand, the second step explores a generic predictive model (integrating C4.5 and PRISM learners) over the discovered information for effective diagnosis of disease. The designed model is a generic one (i.e. not disease specific).
Findings
The empirical results (in terms of top three measures, namely, accuracy, true positive rate and false positive rate) obtained over 14 benchmark medical data sets (collected from https://archive.ics.uci.edu/ml) demonstrate that the hybrid model outperforms the base learners in almost all cases for initial diagnosis of the diseases. After all, the proposed DDSS may work as an e-doctor to detect diseases.
Originality/value
The model designed in this study is original, and the necessary parallelized methods are implemented in C on Cluster HPC machine (FUJITSU) with total 256 cores (under one Master node).