Real-time polymerase chain reaction (RT-PCR) known as the swab test is a diagnostic test that can diagnose COVID-19 disease through respiratory samples in the laboratory. Due to the rapid spread of the coronavirus around the world, the RT-PCR test has become insufficient to get fast results. For this reason, the need for diagnostic methods to fill this gap has arisen and machine learning studies have started in this area.On the other hand, studying medical data is a challenging area because the data it contains is inconsistent, incomplete, difficult to scale, and very large. Additionally, some poor clinical decisions, irrelevant parameters, and limited medical data adversely affect the accuracy of studies performed. Therefore, considering the availability of datasets containing COVID-19 blood parameters, which are less in number than other medical datasets today, it is aimed to improve these existing datasets. In this direction, to obtain more consistent results in COVID-19 machine learning studies, the effect of data preprocessing techniques on the classification of COVID-19 data was investigated in this study. In this study primarily, encoding categorical feature and feature scaling processes were applied to the dataset with 15 features that contain blood data of 279 patients, including gender and age information. Then, the missingness of the dataset was eliminated by using both K-nearest neighbor algorithm (KNN) and chain equations multiple value assignment (MICE) methods. Data balancing has been done with synthetic minority oversampling technique (SMOTE), which is a data balancing method. The effect of data preprocessing techniques on ensemble learning algorithms bagging, AdaBoost, random forest and on popular classifier algorithms KNN classifier, support vector machine, logistic regression, artificial neural network, and decision tree classifiers have been analyzed. The highest accuracies obtained with the bagging classifier were 83.42% and 83.74% with KNN and MICE imputations by applying SMOTE, respectively. On the other hand, the highest accuracy ratio reached with the same classifier without SMOTE was 83.91% for the KNN imputation. In conclusion, certain data preprocessing techniques are examined comparatively and the effect of these data preprocessing techniques on success is presented and the importance of the right combination of data preprocessing to achieve success has been demonstrated by experimental studies.
ÖzSon zamanlarda toplumun en önemli problemlerinden biri olan uyku bozuklukları, bireylerin sağlığını ve yaşam kalitesini ciddi şekilde etkilemektedir. Uykusuzluk (Insomnia), narkolepsi, uyku apnesi ve huzursuz bacak sendromu gibi birçok uyku bozukluklarının neden olduğu rahatsızlıklar vardır. Uyku bozukluklarına sebep olan ana faktör ise bireyin uyku anındaki uyanma ile sonuçlanamayan, uyku kalitesini düşüren uyku kesintileridir. Arousal diğer bir adı ile uyanayazma geçici olan bu kesintilerdir ve bir beyin dalga (Elektroansefalogram -EEG) aktivitesinin paternindeki ani değişikliği temsil etmektedir. Arousal tespiti genellikle EEG verileri kullanılarak Amerikan Uyku Tıbbı Akademisi (American Academy of Sleep Medicine-AASM) tarafından belirlenen kriterlere göre yapılmaktadır. Bu çalışmada amaç, AASM tarafından belirlenen kriterler doğrultusunda EEG sinyalleri vasıtasıyla hasta bireylerdeki arousalların tespitidir. Bu amaç doğrultusunda, öncelikle, çalışmaya dahil edilen 5 hasta bireyin tek kanallı (C3/A2) EEG sinyallerine sırasıyla filtreleme, normalizasyon ve segmantasyon önişlemleri uygulanmıştır. Daha sonra Spektral Güç Yoğunluğu (Power Spectral Density-PSD) ve Ayrık Dalgacık Dönüşümü (Discrete Wavelet Transform-DWT) yöntemleri ile gerçekleştirilen özellik çıkarma süreci sayesinde, EEG sinyal segmentlerine ait 2 özellik seti ve bu özellik setlerinin birleştirilmesiyle 3. özellik seti oluşturulmuştur. Ardından oluşturulan 3 özellik seti üzerine Sarmal Alt Küme Değerlendirme (Wrapper Subset Evaluation-WSE) özellik seçme yöntemi uygulanarak etkin özellikler belirlenmiştir. Nihai olarak belirlenen özelliklerin Yapay Sinir Ağları (YSA) ve Rasgele Orman (RO) algoritmaları tarafından sınıflandırılmaları ile arousal içeren EEG segmentleri tespit edilmiştir. Gerçekleştirilen bu çalışmaların beraberinde EEG sinyal kayıtlarından başka hiçbir PSG sinyal kaydına ihtiyaç duymadan, yalnızca tek kanallı EEG sinyalleri ile oldukça başarılı sonuçlar elde edildiği tespit edilmiştir. Çalışma sonucunda ise Özellik Seti 3'ün etkin özellikleri ve YSA ile en yüksek doğruluk oranı %99.05 olarak elde edilmiştir.
Asymptomatically presenting COVID-19 complicates the detection of infected individuals. Additionally, the virus changes too many genomic variants, which increases the virus’s ability to spread. Because there isn’t a specific treatment for COVID-19 in a short time, the essential goal is to reduce the virulence of the disease. Blood parameters, which contain essential clinical information about infectious diseases and are easy to access, have an important place in COVID-19 detection. The convolutional neural network (CNN) architecture, which is popular in image processing, produces highly successful results for COVID-19 detection models. When the literature is examined, it is seen that COVID-19 studies with CNN are generally done using lung images. In this study, one-dimensional (1D) blood parameters data were converted into two-dimensional (2D) image data after preprocessing, and COVID-19 detection was made with CNN. The t-distributed stochastic neighbor embedding method was applied to transfer the feature vectors to the 2D plane. All data were framed with convex hull and minimum bounding rectangle algorithms to obtain image data. The image data obtained by pixel mapping was presented to the developed 3-line CNN architecture. This study proposes an effective and successful model by providing a combination of low-cost and rapidly-accessible blood parameters and CNN architecture making image data processing highly successful for COVID-19 detection. Ultimately, COVID-19 detection was made with a success rate of 94.85%. This study has brought a new perspective to COVID-19 detection studies by obtaining 2D image data from 1D COVID-19 blood parameters and using CNN.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.