The coronavirus 2019 disease (COVID-19) is wreaking havoc around the world, and great efforts are underway to control it. Millions of people are now being tested and their data keeps accumulating in large volumes. This data can be used to classify newly tested persons as whether they have the disease or not. However, normal classification techniques are hampered by the fact that the data is typically both incomplete and heterogeneous. To address this two-fold obstacle, we propose a KNN variant (KNNV) algorithm which accurately and efficiently classifies COVID-19. The main two ideas behind the proposed algorithm are that for each instance to be classified it chooses the parameter K adaptively and calculates the distances to other instances in a novel way. The KNNV was implemented and tested on a COVID-19 dataset from the Italian society of medical and intervention radiology society. It was also compared to three algorithms of its category. The test results show that the KNNV can efficiently and accurately classify COVID-19 patients. The comparison results show that the algorithm greatly outperforms all its competitors in terms of four metrics: precision, recall, accuracy, and F-Score.
The original K-nearest neighbour ( KNN) algorithm was meant to classify homogeneous complete data, that is, data with only numerical features whose values exist completely. Thus, it faces problems when used with heterogeneous incomplete (HI) data, which has also categorical features and is plagued with missing values. Many solutions have been proposed over the years but most have pitfalls. For example, some solve heterogeneity by converting categorical features into numerical ones, inflicting structural damage. Others solve incompleteness by imputation or elimination, causing semantic disturbance. Almost all use the same K for all query objects, leading to misclassification. In the present work, we introduce KNNHI, a KNN-based algorithm for HI data classification that avoids all these pitfalls. Leveraging rough set theory, KNNHI preserves both categorical and numerical features, leaves missing values untouched and uses a different K for each query. The end result is an accurate classifier, as demonstrated by extensive experimentation on nine datasets mostly from the University of California Irvine repository, using a 10-fold cross-validation technique. We show that KNNHI outperforms six recently published KNN-based algorithms, in terms of precision, recall, accuracy and F-Score. In addition to its function as a mighty classifier, KNNHI can also serve as a K calculator, helping KNN-based algorithms that use a single K value for all queries that find the best such value. Sure enough, we show how four such algorithms improve their performance using the K obtained by KNNHI. Finally, KNNHI exhibits impressive resilience to the degree of incompleteness, degree of heterogeneity and the metric used to measure distance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.