2022
DOI: 10.1007/s42001-022-00165-9
|View full text |Cite
|
Sign up to set email alerts
|

Interpolation of non-random missing values in financial statements’ big data using CatBoost

Abstract: Financial statements’ big data have the characteristics of “Incompleteness” and “Nonrepresentative”. In this paper, employing the world’s largest commercial database on finance, ORBIS, we first find that the rate of missing data varies depending on the country, the type and size of financial items, and the year. Using information on missing data, we interpolate non-random missing financial variables from the previous- and/or next-year values of the same financial item, the values of other financial items, and … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 27 publications
0
3
0
Order By: Relevance
“…Similarly, in the banking and financial sectors, accurate data is the backbone of risk assessments, credit scoring, and investment decisions. Missing or incorrect data can lead to misguided financial strategies, erroneous lending decisions, or misestimation of market risks (Fujimoto et al, 2022 ). Household surveys, often used for demographic research or to assess consumer behavior, are another domain where imputation plays a pivotal role.…”
Section: Discussionmentioning
confidence: 99%
“…Similarly, in the banking and financial sectors, accurate data is the backbone of risk assessments, credit scoring, and investment decisions. Missing or incorrect data can lead to misguided financial strategies, erroneous lending decisions, or misestimation of market risks (Fujimoto et al, 2022 ). Household surveys, often used for demographic research or to assess consumer behavior, are another domain where imputation plays a pivotal role.…”
Section: Discussionmentioning
confidence: 99%
“…These new algorithms' statistical results do, in fact, significantly outperform those of the more established methods, but because to their inner complexity-often referred to as a "black box," it is impossible to confidently explain the judgments they generate. Although recent studies have addressed the issue by either developing explanation models [4] or combining the best of both worlds in efficient and understandable new techniques [5], this problem became even more difficult as clients and financial regulators stressed the need for clarity and explainability of the scoring processes.…”
Section: Related Workmentioning
confidence: 99%
“…Sami Ben Jabeur et al used LightBGM to predict the oil price during the COVID-19 epidemic [9]; Liu, Yingr et al used LightBGM to study whether Digital Inclusive Finance can predict household wealth and analyze the characteristics of strong predictive ability for household wealth.Sami Ben Jabeur et al Used CatBoost to predict bank bankruptcy [10]. Fujimoto Shouji et al used CatBoost to interpolate the non-random missing values in the big data of financial statements [5].…”
Section: Related Workmentioning
confidence: 99%