Background Imbalance between positive and negative outcomes, a so-called class imbalance, is a problem generally found in medical data. Despite various studies, class imbalance has always been a difficult issue. The main objective of this study was to find an effective integrated approach to address the problems posed by class imbalance and to validate the method in an early screening model for a rare cardiovascular disease aortic dissection (AD). Methods Different data-level methods, cost-sensitive learning, and the bagging method were combined to solve the problem of low sensitivity caused by the imbalance of two classes of data. First, feature selection was applied to select the most relevant features using statistical analysis, including significance test and logistic regression. Then, we assigned two different misclassification cost values for two classes, constructed weak classifiers based on the support vector machine (SVM) model, and integrated the weak classifiers with undersampling and bagging methods to build the final strong classifier. Due to the rarity of AD, the data imbalance was particularly prominent. Therefore, we applied our method to the construction of an early screening model for AD disease. Clinical data of 523,213 patients from the Institute of Hypertension, Xiangya Hospital, Central South University were used to verify the validity of this method. In these data, the sample ratio of AD patients to non-AD patients was 1:65, and each sample contained 71 features. Results The proposed ensemble model achieved the highest sensitivity of 82.8%, with training time and specificity reaching 56.4 s and 71.9% respectively. Additionally, it obtained a small variance of sensitivity of 19.58 × 10–3 in the seven-fold cross validation experiment. The results outperformed the common ensemble algorithms of AdaBoost, EasyEnsemble, and Random Forest (RF) as well as the single machine learning (ML) methods of logistic regression, decision tree, k nearest neighbors (KNN), back propagation neural network (BP) and SVM. Among the five single ML algorithms, the SVM model after cost-sensitive learning method performed best with a sensitivity of 79.5% and a specificity of 73.4%. Conclusions In this study, we demonstrate that the integration of feature selection, undersampling, cost-sensitive learning and bagging methods can overcome the challenge of class imbalance in a medical dataset and develop a practical screening model for AD, which could lead to a decision support for screening for AD at an early stage.
Background: As a particularly dangerous and rare cardiovascular disease, aortic dissection (AD) is characterized by complex and diverse symptoms and signs. In the early stage, the rate of misdiagnosis and missed diagnosis is relatively high. This study aimed to use machine learning technology to establish a fast and accurate screening model that requires only patients' routine examination data as input to obtain predictive results.Methods: A retrospective analysis of the examination data and diagnosis results of 53,213 patients with cardiovascular disease was conducted. Among these samples, 802 samples had AD. Forty-two features were extracted from the patients' routine examination data to establish a prediction model. There were five ensemble learning models applied to explore the possibility of using machine learning methods to build screening models for AD, including AdaBoost, XGBoost, SmoteBagging, EasyEnsemble and XGBF. Among these, XGBF is an ensemble learning model that we propose to deal with the imbalance of the positive and negative samples. The seven-fold cross validation method was used to analyze and verify the performance of each model. Due to the imbalance of the samples, the evaluation indicators were sensitivity and specificity.Results: Comparative experiments showed that the sensitivity of XGBF was 80.5%, which was better than the 16.1% of AdaBoost, 15.7% of XGBoost, 78.0% of SmoteBagging and 77.8% of EasyEnsemble.Additionally, XGBF had relatively high specificity, and the training time consumption was short. Based on these three indicators, XGBF performed best, and met the application requirements, which means through careful design, we can use machine learning technology to achieve early AD screening.Conclusions: Through reasonable design, the ensemble learning method can be used to build an effective screening model. The XGBF has high practical application value for screening for AD.
This study aims to identify the effects of characteristics of vehicle, roadway, driver, and environment on fatality of drivers in vehicle-fixed object accidents on expressways in Changsha-Zhuzhou-Xiangtan district of Hunan province in China by developing multinomial logistic regression models. For this purpose, 121 vehicle-fixed object accidents from 2011-2017 are included in the modeling process. First, descriptive statistical analysis is made to understand the main characteristics of the vehicle-fixed object crashes. Then, 19 explanatory variables are selected, and correlation analysis of each two variables is conducted to choose the variables to be concluded. Finally, five multinomial logistic regression models including different independent variables are compared, and the model with best fitting and prediction capability is chosen as the final model. The results showed that the turning direction in avoiding fixed objects raised the possibility that drivers would die. About 64% of drivers died in the accident were found being ejected out of the car, of which 50% did not use a seatbelt before the fatal accidents. Drivers are likely to die when they encounter bad weather on the expressway. Drivers with less than 10 years of driving experience are more likely to die in these accidents. Fatigue or distracted driving is also a significant factor in fatality of drivers. Findings from this research provide an insight into reducing fatality of drivers in vehicle-fixed object accidents.
Aortic dissection (AD), a dangerous disease threatening to human beings, has a hidden onset and rapid progression and has few effective methods in its early diagnosis. At present, although CT angiography acts as the gold standard on AD diagnosis, it is so expensive and time-consuming that it can hardly offer practical help to patients. Meanwhile, the artificial intelligence technology may provide a cheap but effective approach to building an auxiliary diagnosis model for improving the early AD diagnosis rate by taking advantage of the data of the general conditions of AD patients, such as the data about the basic inspection information. Therefore, this study proposes to hybrid five types of machine learning operators into an integrated diagnosis model, as an auxiliary diagnostic approach, to cooperate with the AD-clinical analysis. To improve the diagnose accuracy, the participating rate of each operator in the proposed model may adjust adaptively according to the result of the data learning. After a set of experimental evaluations, the proposed model, acting as the preliminary AD-discriminant, has reached an accuracy of over 80%, which provides a promising instance for medical colleagues.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.