Background : As blood testing is radiation-free, low-cost and simple to operate, some researchers use machine learning to detect COVID-19 from blood test data. However, few studies take into consideration the imbalanced data distribution, which can impair the performance of a classifier. Method : A novel combined dynamic ensemble selection (DES) method is proposed for imbalanced data to detect COVID-19 from complete blood count. This method combines data preprocessing and improved DES. Firstly, we use the hybrid synthetic minority over-sampling technique and edited nearest neighbor (SMOTE-ENN) to balance data and remove noise. Secondly, in order to improve the performance of DES, a novel hybrid multiple clustering and bagging classifier generation (HMCBCG) method is proposed to reinforce the diversity and local regional competence of candidate classifiers. Results : The experimental results based on three popular DES methods show that the performance of HMCBCG is better than only use bagging. HMCBCG+KNE obtains the best performance for COVID-19 screening with 99.81% accuracy, 99.86% F1, 99.78% G-mean and 99.81% AUC. Conclusion : Compared to other advanced methods, our combined DES model can improve accuracy, G-mean, F1 and AUC of COVID-19 screening.
In the context of the outbreak of coronavirus disease (COVID-19), this paper proposes an innovative and systematic decision support model based on Bayesian networks (BNs) to identify and control the risk of COVID-19 patients spreading the virus, which requires the following three steps. First, by consulting the related literature and combining this with expert knowledge, we identify and classify the characteristics (risk factors) of COVID-19 and obtain a conceptual framework for COVID-19 Risk Assessment Bayesian Networks (CRABNs). Second, data on COVID-19 patients with expert scoring results on patient risk levels were collected from hospitals in Hubei Province of China and are used as the training set, and the structure and parameters of the CRABNs model are obtained through machine learning. Finally, we propose two indicators, namely, Model Bias and Model Accuracy, and use the remaining data to verify the feasibility and effectiveness of the CRABNs model to ensure that there are no significant differences between the predicted results of the model and the actual results provided by experts who have relevant experience in treating COVID-19. At the same time, we compared the CRABNs model with the support vector machine (SVM), random forest (RF), and k-nearest neighbour (KNN) models through four indicators: accuracy, sensitivity, specificity, and F-score. The results suggest the reliability of the model and show that it has promising application potential. The proposed model can be used globally by doctors in hospitals as a decision support tool to improve the accuracy of assessing the severity of COVID-19 symptoms in patients. Furthermore, with the further improvement of the model in the future, it can be used for risk assessments in the field of epidemics.
Predicting postoperative survival of lung cancer patients (LCPs) is an important problem of medical decision-making. However, the imbalanced distribution of patient survival in the dataset increases the difficulty of prediction. Although the synthetic minority oversampling technique (SMOTE) can be used to deal with imbalanced data, it cannot identify data noise. On the other hand, many studies use a support vector machine (SVM) combined with resampling technology to deal with imbalanced data. However, most studies require manual setting of SVM parameters, which makes it difficult to obtain the best performance. In this paper, a hybrid improved SMOTE and adaptive SVM method is proposed for imbalance data to predict the postoperative survival of LCPs. The proposed method is divided into two stages: in the first stage, the cross-validated committees filter (CVCF) is used to remove noise samples to improve the performance of SMOTE. In the second stage, we propose an adaptive SVM, which uses fuzzy self-tuning particle swarm optimization (FPSO) to optimize the parameters of SVM. Compared with other advanced algorithms, our proposed method obtains the best performance with 95.11% accuracy, 95.10% G -mean, 95.02% F1, and 95.10% area under the curve (AUC) for predicting postoperative survival of LCPs.
Class imbalance is a common issue in medical diagnosis. Although standard radial basis function neural network (RBF-NN) has achieved remarkably high performance on balanced data, its ability to classify imbalanced data is still limited. So far as we know, cost-sensitive learning is an advanced imbalanced data processing method. However, few studies have focused on the combination of RBF-NN and cost sensitivity. From our knowledge, only one paper has proposed a cost-sensitive RBF-NN for software defect prediction. However, the authors implemented a fixed RBF-NN structure. In this paper, a novel cost-sensitive RBF-NN that optimizes structure and parameters simultaneously is proposed to handle medical imbalanced data. Genetic algorithm (GA) and improved particle swarm optimization (IPSO) are used to optimize the structure and parameters of cost-sensitive RBF-NN respectively, and the optimization of cost-sensitive RBF-NN based on dynamic structure is realized. A cost-sensitive function determined adaptively by the sample distribution as the objective function of RBF-NN, so that it can adapt to datasets with different sample distributions. Experimental results show that the proposed cost-sensitive RBF-NN outperforms other state-of-the-art representative algorithms for five imbalanced medical diagnostic datasets in term of accuracy and area under curve (AUC). It can improve the accuracy of medical diagnosis and reduce the error rate of medical decisions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.