Class overlap and class imbalance are two data complexities that challenge the design of effective classifiers in Pattern Recognition and Data Mining as they may cause a significant loss in performance. Several solutions have been proposed to face both data difficulties, but most of these approaches tackle each problem separately. In this paper, we propose a two-stage under-sampling technique that combines the DBSCAN clustering algorithm to remove noisy samples and clean the decision boundary with a minimum spanning tree algorithm to face the class imbalance, thus handling class overlap and imbalance simultaneously with the aim of improving the performance of classifiers. An extensive experimental study shows a significantly better behavior of the new algorithm as compared to 12 state-of-the-art under-sampling methods using three standard classification models (nearest neighbor rule, J48 decision tree, and support vector machine with a linear kernel) on both real-life and synthetic databases.
With the outbreak of the SARS-CoV-2 o COVID-19 pandemic, multiple studies of risk factors and their influence on patient deaths have been developed. However, little attention is often paid to analyzing patients in risk groups despite the fact that they have been infected and inpatients can survive. In this article, with the dataset available from the Ministery of the health of Mexico, this paper proposes the use of the latent topic extraction algorithm Latent Dirichlet Allocation (LDA) for the study of COVID-19 survival factors in Mexico. The results let us conclude that in the year before strategies for prevention and control of COVID-19, the latent topics support that patients without comorbidities have a low risk of death, compared with the period of 2021, wherein in spite of having some risk factors patients can survive.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.