2019
DOI: 10.1111/rssa.12467
|View full text |Cite
|
Sign up to set email alerts
|

Prediction of Default Probability by using Statistical Models for rare Events

Abstract: Summary Prediction models in credit scoring usually involve the use of data sets with highly imbalanced distributions of the event of interest (default). Logistic regression, which is widely used to estimate the probability of default, PD, often suffers from the problem of separation when the event of interest is rare and consequently poor predictive performance of the minority class in small samples. A common solution is to discard majority class examples, to duplicate minority class examples or to use a comb… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 12 publications
(4 citation statements)
references
References 48 publications
0
4
0
Order By: Relevance
“…In general, other studies have tended to find that Firth penalization does outperform logistic regression in the case of outcome imbalance (Heinze and Schemper, 2002;van Smeden et al, 2016;Kim et al, 2014;Doerken et al, 2019) and Log-F penalization shows promise when working with imbalanced data and can outperform Firth-penalization (Ogundimu, 2019;Rahman and Sultana, 2017).…”
Section: Research Questionsmentioning
confidence: 99%
“…In general, other studies have tended to find that Firth penalization does outperform logistic regression in the case of outcome imbalance (Heinze and Schemper, 2002;van Smeden et al, 2016;Kim et al, 2014;Doerken et al, 2019) and Log-F penalization shows promise when working with imbalanced data and can outperform Firth-penalization (Ogundimu, 2019;Rahman and Sultana, 2017).…”
Section: Research Questionsmentioning
confidence: 99%
“…The intercepts of the outcome equation, β 0 = -2.78 and the selection equation, γ 0 = 1.90 are chosen such that the required event rate and missing data is about 10% and 22% respectively. The 10% event rate is typical of datasets for modelling PD (Ogundimu, 2019). The simulation design ensures that there is one predictor in the selection equation that is not in the outcome equation (exclusion restriction– although this is not essential as demonstrated in Ogundimu, 2021).…”
Section: Numerical Studiesmentioning
confidence: 99%
“…Second, since variable selection provides sparse solution for the true model with true zero coefficients, the predictive performance of the model can be enhanced. We therefore propose a bootstrap internal validation method (Harrell et al, 1996; Ogundimu, 2019) for both the regularized and unregularized sample selection models. Unlike in previous work, where model validation is done using hold-out sample, the bootstrap approach can be used to quantify the degree of optimism in the model.…”
Section: Introductionmentioning
confidence: 99%
“…According to the authors, the proposed log-F(m, m) priors are reasonable, transparent, and computationally straightforward for logistic regression. Emmanuel modeled the prediction of the default probability using the penalized regression models and found that the log-F prior methods are preferred [14]. Note that other methods imposing shrinkage on the regression coefficients can also overcome the separation issue.…”
Section: Introductionmentioning
confidence: 99%