2019
DOI: 10.1016/j.neucom.2018.11.100
|View full text |Cite
|
Sign up to set email alerts
|

Pre-processing approaches for imbalanced distributions in regression

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
151
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 138 publications
(152 citation statements)
references
References 13 publications
0
151
0
1
Order By: Relevance
“…In this respect, learning a multi-dimensional LF raises the issue of learning from highly imbalanced data. This issue is extensively studied in the ML literature for classification problems, but has gathered much less attention from the regression point of view [39][40][41].…”
Section: Learning From Imbalanced Datamentioning
confidence: 99%
“…In this respect, learning a multi-dimensional LF raises the issue of learning from highly imbalanced data. This issue is extensively studied in the ML literature for classification problems, but has gathered much less attention from the regression point of view [39][40][41].…”
Section: Learning From Imbalanced Datamentioning
confidence: 99%
“…Applying the SMOTER strategy undersamples the normal values and oversamples the rare values by generating synthetic data points through interpolation between each rare value and a random selection of one of its k-nearest neighbors. 41,42,55 The feature vector and target value of a synthetic instance, + and + , respectively, are determined as follows: 41 41 where , is the feature vector of an instance in ) , 88 is one of k-nearest neighbors of , , ∈ [0, 1] is a random number, , and 88 are the target values of , and 88 , respectively, and , and 88 are the Euclidean distances between + and , , and between + and 88 , respectively. The amount of undersampling and oversampling was automatically determined according to the following options: Optimal hyperparameters were similarly determined by a grid search ( Table 2).…”
Section: Synthetic Minority Oversampling Technique For Regression (Smmentioning
confidence: 99%
“…30,32 Few methods have been proposed for working with imbalanced distributions in regression domains including: SMOTE for regression (SMOTER), 41 SMOGN, 42 meta learning for utility maximization (MetaUtil), 43 resampled bagging (REBAGG), 44 and weighted relevance-based combination strategy (WERCS). 45 In many bioinformatic and cheminformatic supervised-learning regression problems, the data often follows a normal distribution, and the rare extreme values may be more important to the user than the abundant values centered about the median of the distribution. For example, in predicting Topt for practical applications, higher Topt values are generally more relevant since thermostable enzymes are desired for enhanced biochemical reaction rates.…”
Section: Introductionmentioning
confidence: 99%
“…Ribeiro's thesis also introduces a new set of performance measures for regression; they substitute the calculated utility in place of the ground truth in the traditional precision and recall metrics, and the F1 measure derived from them. SmoteR is used as the basis for resampling imbalanced time series forecasting problems in Moniz et al Extensions to the SmoteR algorithm include refining the results of utility‐based regression by considering the utility of a point's nearest neighbors, or using Gaussian noise to create synthetic examples if a neighbor is too distant . It should, however, be noted that Torgo and his coauthors employ their utility‐based F1 measure as the sole figure of merit in their experimental evaluations.…”
Section: Literature Reviewmentioning
confidence: 99%