The Major work in data pre-processing is handling Missing value imputation in Hepatitis Disease Diagnosis which is one of the primary stage in data mining. Many health datasets are typically imperfect. Just removing the cases from the original datasets can fetch added problems than elucidations. A appropriate technique for missing value imputation can assist to generate high-quality datasets for enhanced scrutinizing in clinical trials. This paper investigates the exploit of a machine learning technique as a missing value imputation process for incomplete Hepatitis data. Mean/mode imputation, ID3 algorithm imputation, decision tree imputation and proposed bootstrap aggregation based imputation are used as missing value imputation and the resultant datasets are classified using KNN. The experiment reveals that classifier performance is enhanced when the Bagging based imputation algorithm is used to foresee missing attribute values.
Medical databases have accumulated huge amounts of information about patients and their medical conditions. Relationships and patterns within the data can provide new medical knowledge. Huge amount of Electronic Health Records (EHRs) are collected over the years have provided a rich base for risk analysis and prediction. An EHR contains digitally stored healthcare information about an individual, such as observations, laboratory tests, diagnostic reports, medications, patient identifying information, and allergies. A special type of EHR is the Health Examination Records (HER) from annual general health checkups. The fundamental challenge of learning a classification model for risk prediction lies in the unlabelled data that constitutes most the collected dataset. Particularly, the unlabelled data describes the participants in health examinations whose health conditions can vary greatly from healthy to very-ill. There is no ground truth for differentiating their states of health. Identifying participants at risk based on their current and past HERs is important for early warning and preventive intervention. Risk means unwanted outcomes such as mortality and morbidity. The proposed system presents a Semi-supervised learning algorithm to handle a challenging multi-class classification problem with substantial unlabelled cases. This algorithm constructs a training set from the diabetes records with unlabelled classes and performs risk analysis with user queries reports. The process shows a new way of predicting risks for participants based on their annual health examinations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.