2016
DOI: 10.1016/j.eswa.2016.09.010
|View full text |Cite
|
Sign up to set email alerts
|

A priori synthetic over-sampling methods for increasing classification sensitivity in imbalanced data sets

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
14
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 60 publications
(14 citation statements)
references
References 27 publications
0
14
0
Order By: Relevance
“…2. Oversampling The oversampling (upsampling) [17] method works by replicating observations from the minority class to obtain balanced data. The prominent disadvantage of oversampling is the likelihood that overfitting may be increased due to the extra copies of the minority class examples that are created.…”
Section: • Informed Undersamplingmentioning
confidence: 99%
“…2. Oversampling The oversampling (upsampling) [17] method works by replicating observations from the minority class to obtain balanced data. The prominent disadvantage of oversampling is the likelihood that overfitting may be increased due to the extra copies of the minority class examples that are created.…”
Section: • Informed Undersamplingmentioning
confidence: 99%
“…According to the dephosphorization thermodynamics and practical operation in Consteel electric furnace, the end-point phosphorus content is mainly determined by 17 process variables, including the chemical composition of hot metal, hot metal weight, scrap weight, lime weight, dolomite weight, carbon powder weight, smelting cycle, limestone weight, oxygen consumption, natural gas consumption, electricity consumption, end-point C content and end-point temperature. Because the industrial data con-tains numerous noise data which would disturb the model construction and result in incorrect results, 14,15) box-plot method is employed in this paper to filter the exceptional data out, and for the preprocessed data, the main statistics information of the process parameters is shown in Table 1. The symbols from X 1 to X 17 present the input process variables, and Y is the output variable.…”
Section: Collection Of Main Data Parametersmentioning
confidence: 99%
“…The original industrial data usually concentrate in a small area because of the predefined process schedule. Modeling with these data results in the prediction accurate in high density data area while inaccurate in low density data area . Therefore, the data should be balanced before modeling.…”
Section: Industrial Data Processingmentioning
confidence: 99%