Support vector machines for classification of low birth weight in Indonesia

Eliyati, Ning; Faruk, Alfensi; Kresnawati, Endang Sri; Arifieni, Ika

doi:10.1088/1742-6596/1282/1/012010

Cited by 14 publications

(7 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, the applications of ML in the field of public health have increased day by day. Some works on ML were used for prediction of different fields as malnutrition [ 22 – 24 ], anemia [ 25 – 27 ], diabetes [ 28 ], low birth weight [ 29 – 32 ], child mortality [ 33 – 35 ], and so on. There was also some work on ML for prediction of underweight [ 22 – 24 , 36 , 37 ], stunted and wasted [ 23 , 24 ].…”

Section: Introductionmentioning

confidence: 99%

Investigate the risk factors of stunting, wasting, and underweight among under-five Bangladeshi children and its prediction based on machine learning approach

et al. 2021

View full text Add to dashboard Cite

Aims Malnutrition is a major health issue among Bangladeshi under-five (U5) children. Children are malnourished if the calories and proteins they take through their diet are not sufficient for their growth and maintenance. The goal of the research was to use machine learning (ML) algorithms to detect the risk factors of malnutrition (stunted, wasted, and underweight) as well as their prediction. Methods This work utilized malnutrition data that was derived from Bangladesh Demographic and Health Survey which was conducted in 2014. The selected dataset consisted of 7079 children with 13 factors. The potential risks of malnutrition have been identified by logistic regression (LR). Moreover, 3 ML classifiers (support vector machine (SVM), random forest (RF), and LR) have been implemented for predicting malnutrition and the performance of these ML algorithms were assessed on the basis of accuracy. Results The average prevalence of stunted, wasted, and underweight was 35.4%, 15.4%, and 32.8%, respectively. It was noted that LR identified five risk factors for stunting and underweight, as well as four factors for wasting. Results illustrated that RF can be accurately classified as stunted, wasted, and underweight children and obtained the highest accuracy of 88.3% for stunted, 87.7% for wasted, and 85.7% for underweight. Conclusion This research focused on the identification and prediction of major risk factors for stunting, wasting, and underweight using ML algorithms which will aid policymakers in reducing malnutrition among Bangladesh’s U5 children.

show abstract

Section: Introductionmentioning

confidence: 99%

Investigate the risk factors of stunting, wasting, and underweight among under-five Bangladeshi children and its prediction based on machine learning approach

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Based on statistical learning theory, this algorithm can be applied to all linear and nonlinear classification problems. In classification, linear or non-linear (Kernel type) functions are used based on the structure of the process ( Noble, 2006 ; Eliyati et al, 2019 ). SVM basically tries to separate two classes with a line or plane.…”

Section: Methodsmentioning

confidence: 99%

Application of interval type-2 fuzzy logic and type-1 fuzzy logic-based approaches to social networks for spam detection with combined feature capabilities

Atacak

Çıtlak

Doğru

2023

PeerJ Computer Science

View full text Add to dashboard Cite

Background Social networks are large platforms that allow their users to interact with each other on the Internet. Today, the widespread use of social networks has made them vulnerable to malicious use through different methods such as fake accounts and spam. As a result, many social network users are exposed to the harmful effects of spam accounts created by malicious people. Although Twitter, one of the most popular social networking platforms, uses spam filters to protect its users from the harmful effects of spam, these filters are insufficient to detect spam accounts that exhibit new methods and behaviours. That’s why on social networking platforms like Twitter, it has become a necessity to use robust and more dynamic methods to detect spam accounts. Methods Fuzzy logic (FL) based approaches, as they are the models such that generate results by interpreting the data obtained based on heuristics viewpoint according to past experiences, they can provide robust and dynamic solutions in spam detection, as in many application areas. For this purpose, a data set was created by collecting data on the twitter platform for spam detection. In the study, fuzzy logic-based classification approaches are suggested for spam detection. In the first stage of the proposed method, a data set with extracted attributes was obtained by applying normalization and crowdsourcing approaches to the raw data obtained from Twitter. In the next stage, as a process of the data preprocessing step, six attributes in the binary form in the data set were subjected to a rating-based transformation and combined with the other real-valued attribute to create a database to be used in spam detection. Classification process inputs were obtained by applying the fisher-score method, one of the commonly used filter-based methods, to the data set obtained in the second stage. In the last stage, the data were classified based on FL based approaches according to the obtained inputs. As FL approaches, four different Mamdani and Sugeno fuzzy inference systems based on interval type-1 and Interval Type-2 were used. Finally, in the classification phase, four different machine learning (ML) approaches including support vector machine (SVM), Bayesian point machine (BPM), logistic regression (LR) and average perceptron (Avr Prc) methods were used to test the effectiveness of these approaches in detecting spam. Results Experimental results were obtained by applying different FL and ML based approaches on the data set created in the study. As a result of the experiments, the Interval Type-2 Mamdani fuzzy inference system (IT2M-FIS) provided the highest performance with an accuracy of 0.955, a recall of 0.967, an F-score 0.962 and an area under the curve (AUC) of 0.971. However, it has been observed that FL-based spam models have a higher performance than ML-based spam models in terms of metrics including accuracy, recall, F-score and AUC values.

show abstract

“…Although these studies achieved remarkable ML model performance on a small imbalanced data set, the results could be misleading and biased toward the majority class (ie, non-LBW) owing to the data imbalance issue causing them to learn based on the error rate without considering the class distribution. The studies in the second group used larger imbalanced data sets but still did not apply any rebalancing methods to their imbalanced data sets [22,23,[27][28][29]34]. Their high accuracy and low area under the receiver operating characteristic curve (AUROC) scores revealed that misleading performance remains a persistent issue [33].…”

Section: Related Workmentioning

confidence: 99%

Issue of Data Imbalance on Low Birthweight Baby Outcomes Prediction and Associated Risk Factors Identification: Establishment of Benchmarking Key Machine Learning Models With Data Rebalancing Strategies

Ren¹,

Wu²,

Tong³

et al. 2023

J Med Internet Res

View full text Add to dashboard Cite

Background Low birthweight (LBW) is a leading cause of neonatal mortality in the United States and a major causative factor of adverse health effects in newborns. Identifying high-risk patients early in prenatal care is crucial to preventing adverse outcomes. Previous studies have proposed various machine learning (ML) models for LBW prediction task, but they were limited by small and imbalanced data sets. Some authors attempted to address this through different data rebalancing methods. However, most of their reported performances did not reflect the models’ actual performance in real-life scenarios. To date, few studies have successfully benchmarked the performance of ML models in maternal health; thus, it is critical to establish benchmarks to advance ML use to subsequently improve birth outcomes. Objective This study aimed to establish several key benchmarking ML models to predict LBW and systematically apply different rebalancing optimization methods to a large-scale and extremely imbalanced all-payer hospital record data set that connects mother and baby data at a state level in the United States. We also performed feature importance analysis to identify the most contributing features in the LBW classification task, which can aid in targeted intervention. Methods Our large data set consisted of 266,687 birth records across 6 years, and 8.63% (n=23,019) of records were labeled as LBW. To set up benchmarking ML models to predict LBW, we applied 7 classic ML models (ie, logistic regression, naive Bayes, random forest, extreme gradient boosting, adaptive boosting, multilayer perceptron, and sequential artificial neural network) while using 4 different data rebalancing methods: random undersampling, random oversampling, synthetic minority oversampling technique, and weight rebalancing. Owing to ethical considerations, in addition to ML evaluation metrics, we primarily used recall to evaluate model performance, indicating the number of correctly predicted LBW cases out of all actual LBW cases, as false negative health care outcomes could be fatal. We further analyzed feature importance to explore the degree to which each feature contributed to ML model prediction among our best-performing models. Results We found that extreme gradient boosting achieved the highest recall score—0.70—using the weight rebalancing method. Our results showed that various data rebalancing methods improved the prediction performance of the LBW group substantially. From the feature importance analysis, maternal race, age, payment source, sum of predelivery emergency department and inpatient hospitalizations, predelivery disease profile, and different social vulnerability index components were important risk factors associated with LBW. Conclusions Our findings establish useful ML benchmarks to improve birth outcomes in the maternal health domain. They are informative to identify the minority class (ie, LBW) based on an extremely imbalanced data set, which may guide the development of personalized LBW early prevention, clinical interventions, and statewide maternal and infant health policy changes.

show abstract

Support vector machines for classification of low birth weight in Indonesia

Cited by 14 publications

References 5 publications

Investigate the risk factors of stunting, wasting, and underweight among under-five Bangladeshi children and its prediction based on machine learning approach

Investigate the risk factors of stunting, wasting, and underweight among under-five Bangladeshi children and its prediction based on machine learning approach

Application of interval type-2 fuzzy logic and type-1 fuzzy logic-based approaches to social networks for spam detection with combined feature capabilities

Issue of Data Imbalance on Low Birthweight Baby Outcomes Prediction and Associated Risk Factors Identification: Establishment of Benchmarking Key Machine Learning Models With Data Rebalancing Strategies

Contact Info

Product

Resources

About