In healthcare management, a large volume of multi-structured patient data is generated from the clinical reports, doctor's notes, and wearable body sensors. The analysis of healthcare parameters and the prediction of the subsequent future health conditions are still in the informative stage. A cloud-enabled big data analytic platform is the best way to analyze the structured and unstructured data generated from healthcare management systems. In this paper, a probabilistic data collection mechanism is designed and the correlation analysis of those collected data is performed. Finally, a stochastic prediction model is designed to foresee the future health condition of the most correlated patients based on their current health status. Performance evaluation of the proposed protocols is realized through extensive simulations in the cloud environment, which gives about 98% accuracy of prediction, and maintains 90% of CPU and bandwidth utilization to reduce the analysis time. Big data, cloud, healthcare, prediction.
INDEX TERMS
The prediction of tumor in the TNM staging (tumor, node, and metastasis) stage of colon cancer using the most influential histopathology parameters and to predict the five years disease-free survival (DFS) period using machine learning (ML) in clinical research have been studied here. From the colorectal cancer (CRC) registry of Chang Gung Memorial Hospital, Linkou, Taiwan, 4021 patients were selected for the analysis. Various ML algorithms were applied for the tumor stage prediction of the colon cancer by considering the Tumor Aggression Score (TAS) as a prognostic factor. Performances of different ML algorithms were evaluated using five-fold cross-validation, which is an effective way of the model validation. The accuracy achieved by the algorithms taking both cases of standard TNM staging and TNM staging with the Tumor Aggression Score was determined. It was observed that the Random Forest model achieved an F-measure of 0.89, when the Tumor Aggression Score was considered as an attribute along with the standard attributes normally used for the TNM stage prediction. We also found that the Random Forest algorithm outperformed all other algorithms, with an accuracy of approximately 84% and an area under the curve (AUC) of 0.82 ± 0.10 for predicting the five years DFS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.