Real-time polymerase chain reaction (RT-PCR) known as the swab test is a diagnostic test that can diagnose COVID-19 disease through respiratory samples in the laboratory. Due to the rapid spread of the coronavirus around the world, the RT-PCR test has become insufficient to get fast results. For this reason, the need for diagnostic methods to fill this gap has arisen and machine learning studies have started in this area.On the other hand, studying medical data is a challenging area because the data it contains is inconsistent, incomplete, difficult to scale, and very large. Additionally, some poor clinical decisions, irrelevant parameters, and limited medical data adversely affect the accuracy of studies performed. Therefore, considering the availability of datasets containing COVID-19 blood parameters, which are less in number than other medical datasets today, it is aimed to improve these existing datasets. In this direction, to obtain more consistent results in COVID-19 machine learning studies, the effect of data preprocessing techniques on the classification of COVID-19 data was investigated in this study. In this study primarily, encoding categorical feature and feature scaling processes were applied to the dataset with 15 features that contain blood data of 279 patients, including gender and age information. Then, the missingness of the dataset was eliminated by using both K-nearest neighbor algorithm (KNN) and chain equations multiple value assignment (MICE) methods. Data balancing has been done with synthetic minority oversampling technique (SMOTE), which is a data balancing method. The effect of data preprocessing techniques on ensemble learning algorithms bagging, AdaBoost, random forest and on popular classifier algorithms KNN classifier, support vector machine, logistic regression, artificial neural network, and decision tree classifiers have been analyzed. The highest accuracies obtained with the bagging classifier were 83.42% and 83.74% with KNN and MICE imputations by applying SMOTE, respectively. On the other hand, the highest accuracy ratio reached with the same classifier without SMOTE was 83.91% for the KNN imputation. In conclusion, certain data preprocessing techniques are examined comparatively and the effect of these data preprocessing techniques on success is presented and the importance of the right combination of data preprocessing to achieve success has been demonstrated by experimental studies.
Diabetes has become a pervasive and endemic health problem worldwide. It is a chronic disease and also life-threatening. It can cause health problems in many organs such as the heart, kidneys, eyes, nerves, and blood vessels. To reduce the fatality rate from diabetes, early prevention techniques are needed. Nowadays, machine learning techniques are used to predict or detect different life-threatening diseases like cancer, diabetes, heart diseases, thyroid, etc. In this study, a prediction model of diabetes mellitus was presented using the Pima Indian dataset. Three different machine learning techniques that Decision Tree (DT), Random Forest (RF) and, Gradient Boosting (GB) algorithm were used to predict diabetes mellitus and the performance analysis was performed. Confusion matrix, accuracy, F1 score, precision, recall, Cohen's kappa were evaluated and also a ROC curve was plotted. Out of the three techniques, the best results have been achieved with GB.
In this paper, automatic black and white image colorization method has been proposed. The study is based on the best-known deep learning algorithm CNN (Convolutional neural network). The Model that developed taking the input in gray scale and predict the color of image based on the dataset that trained on it. The color space used in this work is Lab Color space the model takes the L channel as the input and the ab channels as the output. The Image Net dataset used and random selected image have been used to construct a mini dataset of images that contains 39,604 images splitted into 80% training and 20% testing. The proposed method has been tested and evaluated on samples images with Meansquared error and peak signal to noise ratio and reached an average of MSE= 51.36 and PSNR= 31.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.