Density-based weighting for imbalanced regression

Steininger, Michael; Kobs, Konstantin; Davidson, Padraig; Krause, Anna; Hotho, Andreas

doi:10.1007/s10994-021-06023-5

Cited by 87 publications

(35 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The dataset was resampled by using SMOGN as a data preprocessing step before the training of the regression models because the data distribution was not balanced. The SMOGN algorithm, which mixes oversampling with Gaussian noise [57], is based on the SMOTER method. The density of the outcome of the dataset after oversampling with the SMOGN approach is shown in Figure 7 together with the initial data density.…”

Section: Regression Resultsmentioning

confidence: 99%

“…The SMOTER and the introduction of Gaussian Noise are two oversampling techniques combined in the SMOGN approach. Based on the distance to the k-nearest neighbors, SMOGN iterates through all unusual samples and selects between SMOTER's interpolation-based oversampling and Gaussian noise-based oversampling [57]. The data imbalance effect was eliminated by using the SMOTE and SMOGN techniques for classification and regression models, respectively to resample the dataset into the classes and change the relative frequency of the other labels.…”

Section: Resamplingmentioning

confidence: 99%

See 1 more Smart Citation

Machine Learning to Predict the Antimicrobial Activity of Cold Atmospheric Plasma-Activated Liquids

Özdemir¹,

Özdemir²,

Gül³

et al. 2022

Preprint

View full text Add to dashboard Cite

Plasma is defined as the fourth state of matter and non-thermal plasma can be produced at atmospheric pressure under a high electrical field. The strong and broadspectrum antimicrobial effect of plasma-activated liquids (PALs) is now well known. The antimicrobial effects of PALs depend on many different variables, which complicates the comparison of different studies and determines the most dominant parameters of the antimicrobial effect. The proven applicability of machine learning (ML) in the medical field is encouraging for its application in the field of plasma medicine as well. Thus, ML applications on PALs could present a new perspective to better understand the influences of various parameters on their antimicrobial effects. In this paper, comparative supervised ML models are presented by using previously obtained data to qualitatively predict the in vitro antimicrobial activity of PALs. A literature search was performed and data is collected from 33 relevant articles. After the required normalization, feature encoding, and resampling steps, two supervised ML methods, namely classification, and regression are applied to data to obtain microbial inactivation (MI) predictions. For classification, MI is labeled in four categories and for regression, MI is used as a continuous variable. 16 different classifiers and 14 different regressors are implemented to predict MI value. Two different robust cross-validation strategies are conducted for classification and regression models to evaluate the proposed method; repeated stratified k-fold cross-validation and k-fold cross-validation, respectively. We also investigate the effect of different features on models. The results demonstrated that the hyperparameter-optimized Random Forest Classifier (oRFC) and Random Forest Regressor (oRFR) provided better results than other models for the classification and regression, respectively. Finally, the best test accuracy of 82.68% for oRFC and R 2 of 0.75 for the oRFR are obtained. ML techniques could contribute to a better understanding of plasma parameters that have a dominant * Corresponding author.

show abstract

Section: Regression Resultsmentioning

confidence: 99%

Section: Resamplingmentioning

confidence: 99%

Machine Learning to Predict the Antimicrobial Activity of Cold Atmospheric Plasma-Activated Liquids

Özdemir¹,

Özdemir²,

Gül³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The data augmentation and class weight approaches are standards in deep learning. Data augmentation is an approach to increasing the diversity of training data [55], and class weight is used to determine the weight of each categorical variable when the dataset is unbalanced [56]. These approaches might be employed to ensure the diversity and completeness of selected projects in the imbalanced projects.…”

Section: Discussionmentioning

confidence: 99%

Comparing Multiple Linear Regression, Deep Learning and Multiple Perceptron for Functional Points Estimation

et al. 2022

View full text Add to dashboard Cite

This study compares the performance of Pytorch-based Deep Learning, Multiple Perceptron Neural Networks with Multiple Linear Regression in terms of software effort estimations based on function point analysis. This study investigates Adjusted Function Points, Function Point Categories, Industry Sector, and Relative Size. The ISBSG dataset (version 2020/R1) is used as the historical dataset. The effort estimation performance is compared among multiple models by evaluating a prediction level of 0.30 and standardized accuracy. According to the findings, the Multiple Perceptron Neural Network based on Adjusted Function Points combined with Industry Sector predictors yielded 53% and 61% in terms of standardized accuracy and a prediction level of 0.30, respectively. The findings of Pytorch-based Deep Learning are similar to Multiple Perceptron Neural Networks, with even better results than that, with standardized accuracy and a prediction level of 0.30, 72% and 72%, respectively. The results reveal that both the Pytorch-based Deep Learning and Multiple Perceptron model outperformed Multiple Linear Regression and baseline models using the experimental dataset. Furthermore, in the studied dataset, Adjusted Function Points may not contribute to higher accuracy than Function Point Categories.

show abstract

“…Many approaches revolve around modifications of SMOTE such as adapted to regression SMOTER [38], augmented with Gaussian Noise SMOGN [39], or [40] work extending for prediction of extremely rare values. [41] proposed DenseWeight, a method based on Kernel Density Estimation for better assessment of the relevance function for sample reweighing. [42] proposed a distribution smoothing over label (LDS) and feature space (FDS) for imbalanced regression.…”

Section: Related Workmentioning

confidence: 99%

Taming the Long Tail of Deep Probabilistic Forecasting

Kozerawski¹,

Sharan²,

Yu³

2022

Preprint

View full text Add to dashboard Cite

Deep probabilistic forecasting is gaining attention in numerous applications ranging from weather prognosis, through electricity consumption estimation, to autonomous vehicle trajectory prediction. However, existing approaches focus on improvements on the most common scenarios without addressing the performance on rare and difficult cases. In this work, we identify a long tail behavior in the performance of state-of-the-art deep learning methods on probabilistic forecasting. We present two moment-based tailedness measurement concepts to improve performance on the difficult tail examples: Pareto Loss and Kurtosis Loss. Kurtosis loss is a symmetric measurement as the fourth moment about the mean of the loss distribution. Pareto loss is asymmetric measuring right tailedness, modeling the loss using a generalized Pareto distribution (GPD). We demonstrate the performance of our approach on several real-world datasets including time series and spatiotemporal trajectories, achieving significant improvements on the tail examples.

show abstract

Density-based weighting for imbalanced regression

Cited by 87 publications

References 21 publications

Machine Learning to Predict the Antimicrobial Activity of Cold Atmospheric Plasma-Activated Liquids

Machine Learning to Predict the Antimicrobial Activity of Cold Atmospheric Plasma-Activated Liquids

Comparing Multiple Linear Regression, Deep Learning and Multiple Perceptron for Functional Points Estimation

Taming the Long Tail of Deep Probabilistic Forecasting

Contact Info

Product

Resources

About