CatBoost: unbiased boosting with categorical features

Prokhorenkova, Liudmila; Gusev, Gleb; Vorobev, Aleksandr; Dorogush, Anna Veronika; Gulin, Andrey

doi:10.48550/arxiv.1706.09516

Cited by 194 publications

(207 citation statements)

References 14 publications

Supporting

Mentioning

204

Contrasting

Unclassified

Order By: Relevance

“…Different implementations of the Gradient Boosted Decision Trees method exist, e.g. XGBoost (Chen and Guestrin, 2016), LightGBM (Ke et al, 2017), CatBoost (Prokhorenkova et al, 2017). We use here LightGBM.…”

Section: Problem Settingsmentioning

confidence: 99%

Dissecting the explanatory power of ESG features on equity returns by sector, capitalization, and year with interpretable machine learning

Assael,

Carlier,

Challet

2022

Preprint

View full text Add to dashboard Cite

We systematically investigate the links between price returns and ESG features. We propose a cross-validation scheme with random company-wise validation to mitigate the relative initial lack of quantity and quality of ESG data, which allows us to use most of the latest and best data to both train and validate our models. Boosted trees successfully explain a single bit of annual price returns not accounted for in the traditional market factor. We check with benchmark features that ESG features do contain significantly more information than basic fundamental features alone. The most relevant sub-ESG feature encodes controversies. Finally, we find opposite effects of better ESG scores on the price returns of small and large capitalization companies: better ESG scores are generally associated with larger price returns for the latter, and reversely for the former.

show abstract

Section: Problem Settingsmentioning

confidence: 99%

Dissecting the explanatory power of ESG features on equity returns by sector, capitalization, and year with interpretable machine learning

Assael,

Carlier,

Challet

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In this study, we will use the data from 2016-2018 to obtain the value of xi k for the rest of the period. Therefore, there will not be any issue with the target leakage problem (Zhang et al, 2013;Prokhorenkova et al, 2017).…”

Section: Competition-dependent Factor and Team-level Historical Recordsmentioning

confidence: 99%

Forecasting number of corner kicks taken in association football using overdispersed distribution

Yip¹,

Zou²,

Hung³

et al. 2021

Preprint

View full text Add to dashboard Cite

This paper presents a novel compound Poisson regression model to forecast number of corner kicks taken in association football. Corner kick taken events are often decisive in the match outcome and embody serial correlation and clustered pattern. Providing parameter estimates with intuitive interpretation, a class of compound Poisson distribution including a Bayesian implementation of geometric-Poisson distribution is introduced. Apart from introducing a new statistical framework, the utilisation of cross-market data, target encoding techniques and treatment to the data-rich-data-poor problem to enhance the model predictability are also discussed.

show abstract

“…We present metrics for the joint evaluation of predictive uncertainty and robustness to distributional shift. We validate our proposed metrics using the baseline Shifts Challenge Gradient Boosted Decision Trees (GBDT) models [15][16].…”

Section: Evaluation Metricsmentioning

confidence: 99%

Evaluating Predictive Uncertainty and Robustness to Distributional Shift Using Real World Data

Lakara¹,

Bhandari²,

Pratinav³

et al. 2021

Preprint

View full text Add to dashboard Cite

Most machine learning models operate under the assumption that the training, testing and deployment data is independent and identically distributed (i.i.d.). This assumption doesn't generally hold true in a natural setting. Usually, the deployment data is subject to various types of distributional shifts. The magnitude of a model's performance is proportional to this shift in the distribution of the dataset. Thus it becomes necessary to evaluate a model's uncertainty and robustness to distributional shifts to get a realistic estimate of its expected performance on real-world data. Present methods to evaluate uncertainty and model's robustness are lacking and often fail to paint the full picture. Moreover, most analysis so far has primarily focused on classification tasks. In this paper, we propose more insightful metrics for general regression tasks using the Shifts Weather Prediction Dataset. We also present an evaluation of the baseline methods using these metrics.

show abstract

CatBoost: unbiased boosting with categorical features

Cited by 194 publications

References 14 publications

Dissecting the explanatory power of ESG features on equity returns by sector, capitalization, and year with interpretable machine learning

Dissecting the explanatory power of ESG features on equity returns by sector, capitalization, and year with interpretable machine learning

Forecasting number of corner kicks taken in association football using overdispersed distribution

Evaluating Predictive Uncertainty and Robustness to Distributional Shift Using Real World Data

Contact Info

Product

Resources

About