Diabetes is one of the most rapidly spreading diseases in the world, resulting in an array of significant complications, including cardiovascular disease, kidney failure, diabetic retinopathy, and neuropathy, among others, which contribute to an increase in morbidity and mortality rate. If diabetes is diagnosed at an early stage, its severity and underlying risk factors can be significantly reduced. However, there is a shortage of labeled data and the occurrence of outliers or data missingness in clinical datasets that are reliable and effective for diabetes prediction, making it a challenging endeavor. Therefore, we introduce a newly labeled diabetes dataset from a South Asian nation (Bangladesh). In addition, we suggest an automated classification pipeline that includes a weighted ensemble of machine learning (ML) classifiers: Naive Bayes (NB), Random Forest (RF), Decision Tree (DT), XGBoost (XGB), and LightGBM (LGB). Grid search hyperparameter optimization is employed to tune the critical hyperparameters of these ML models. Furthermore, missing value imputation, feature selection, and K-fold cross-validation are included in the framework design. A statistical analysis of variance (ANOVA) test reveals that the performance of diabetes prediction significantly improves when the proposed weighted ensemble (DT + RF + XGB + LGB) is executed with the introduced preprocessing, with the highest accuracy of 0.735 and an area under the ROC curve (AUC) of 0.832. In conjunction with the suggested ensemble model, our statistical imputation and RF-based feature selection techniques produced the best results for early diabetes prediction. Moreover, the presented new dataset will contribute to developing and implementing robust ML models for diabetes prediction utilizing population-level data.
Research on heart diseases has always been the center of attention of the world health organization. More than 17.9 million people died from it in 2016, which represent 31% of the overall deaths globally. Machine learning techniques have been used extensively in that area to assist physicians to develop a firm opinion about the conditions of their heart disease patients. Some of the existing machine learning models still suffers from limited predication ability, and the chosen analysis approaches are not suitable. As well, it was noticed that the existing approaches pay more attention to building high accuracy models, while overlooking the ability to interpret and understand the recommendations of these models. In this research, different renowned machine learning techniques: Artificial Neural Networks, Support Vector Machines, Naïve Bayes, Decision Trees and Random Forests have been investigated to help in building, understanding and interpreting different heart disease diagnosing models. The Artificial Neural Networks model showed the best accuracy of 84.25% compared to the other models. In addition, it was found that despite some designed models have higher accuracies than others, it may be safer to choose a lower accuracy model as a final design of this study. This sacrifice was essential to make sure that a more transparent and trusted model is being used in the heart disease diagnosis process. This transparency validation was conducted using a newly suggested metric: the Feature Ranking Cost index. The use of that index showed promising results by making it clear as which machine learning model has a balance between accuracy and transparency. It is expected that following the detailed analyses and the use of this research findings will be useful to the machine learning community as it could be the basis for post-hoc prediction model interpretation of different clinical data sets.
Sunflower is a crop that has many economic values and ornamental usages. However, its production can be hampered due to various diseases such as downy mildew, gray mold, and leaf scars, and it is challenging for farmers to identify disease-prone conditions with traditional approaches. Thus, a computerized model composed of vision, artificial intelligence, and machine learning is the demand of the age to detect diseases in plants efficiently. In this paper, we develop a hybrid model with transfer learning (TL) and a simple CNN using a small dataset for detecting sunflower diseases. Out of the eight models tested on the dataset of four different classes (downy mildew, gray mold, leaf scars, and fresh leaf), the VGG19 + CNN hybrid model achieves the best results in terms of precision, recall, F1-score, accuracy, Hamming loss, Matthews coefficient, Jaccard score, and Cohen’s kappa metrics. The experimental outcomes show that the proposed model provides better precision, recall, and accuracy than other approaches on the benchmark dataset.
Planning effective routes and monitoring vehicle traffic are essential for creating sustainable smart cities. Accurate speed prediction is a key component of these efforts, as it aids in alleviating traffic congestion. While their physical proximity is important, the interconnection of these road segments is what significantly contributes to the increase of traffic congestion. This interconnectedness poses a significant challenge to increasing prediction accuracy. To address this, we propose a novel approach based on Deep Graph Neural Networks (DGNNs), which represent the connectedness of road sections as a graph using Graph Neural Networks (GNNs). In this study, we implement the proposed approach, called STGGAN, for real-time traffic-speed estimation using two different actual traffic datasets: PeMSD4 and PeMSD8. The experimental results validate the prediction accuracy values of 96.67% and 98.75% for the PeMSD4 and PeMSD8 datasets, respectively. The computation of mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) also shows a progressive decline in these error values with increasing iteration count, demonstrating the success of the suggested technique. To confirm the feasibility, reliability, and applicability of the suggested STGGAN technique, we also perform a comparison analysis, including several statistical, analytical, and machine-learning- and deep-learning-based approaches. Our work contributes significantly to the field of traffic-speed estimation by considering the structure and characteristics of road networks through the implementation of DGNNs. The proposed technique trains a neural network to accurately predict traffic flow using data from the entire road network. Additionally, we extend DGNNs by incorporating Gated Graph Attention Network (GGAN) blocks, enabling the modification of the input and output to sequential graphs. The prediction accuracy of the proposed model based on DGNNs is thoroughly evaluated through extensive tests on real-world datasets, providing a comprehensive comparison with existing state-of-the-art models for traffic-flow forecasting.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.