2019
DOI: 10.3390/ijfs7020028
|View full text |Cite
|
Sign up to set email alerts
|

On Unbalanced Sampling in Bankruptcy Prediction

Abstract: The paper discusses methodological topics of bankruptcy prediction modelling-unbalanced sampling, sample bias, and unbiased predictions of bankruptcy. Bankruptcy models are typically estimated with the use of non-random samples, which creates sample choice biases. We consider two types of unbalanced samples: (a) when bankrupt and non-bankrupt companies enter the sample in unequal numbers; and (b) when sample composition allows for different ratios of bankrupt and non-bankrupt companies than those in the popula… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 21 publications
0
3
0
Order By: Relevance
“…When samples are unbalanced, Cramer (1999) advocates the use of a cut-off point α equal to the proportion of ones in the sample. In effect, the success rates for y i = 1 and y i = 0 are better spread than for the typical cut-off point of 0.5 (Gruszczyński 2019).…”
mentioning
confidence: 87%
“…When samples are unbalanced, Cramer (1999) advocates the use of a cut-off point α equal to the proportion of ones in the sample. In effect, the success rates for y i = 1 and y i = 0 are better spread than for the typical cut-off point of 0.5 (Gruszczyński 2019).…”
mentioning
confidence: 87%
“…To clarify the nature of the problem, if we applied a methodology that predicted all the firms to fail in our data, we would still get an accuracy measure for correctly classified observations over total observations of 94.3%. To deal with similar unbalanced outcomes, one could either under-sample the label that accounts for the largest majority of observations, or oversample observations for the other label, as suggested in other applications (Gruszczyński, 2019;Zhou, 2013). In our case, we argue that any re-sampling method is sub-optimal because we know from Section 2, Table 2, and Appendix Table A2 that patterns of missing values are positively correlated with the outcome.…”
Section: Robustness and Sensitivity Checksmentioning
confidence: 98%
“…This exchange of student data among neighboring cases blurred the results, especially in areas where large distances were required (see Figure 3). Nevertheless, this operation allowed us to avoid unbalanced sampling [77] and to prepare data for feature selection and model training. Our results indicated that the selected features of the models using Boruta resulted in RF predictions with overall scores of around 0.83-0.84% and Kappa values of 0.65-0.67%.…”
Section: Distance-based Sampling and Boruta Implementationmentioning
confidence: 99%