2016
DOI: 10.1021/acs.jcim.6b00591
|View full text |Cite|
|
Sign up to set email alerts
|

Extreme Gradient Boosting as a Method for Quantitative Structure–Activity Relationships

Abstract: In the pharmaceutical industry it is common to generate many QSAR models from training sets containing a large number of molecules and a large number of descriptors. The best QSAR methods are those that can generate the most accurate predictions but that are not overly expensive computationally. In this paper we compare eXtreme Gradient Boosting (XGBoost) to random forest and single-task deep neural nets on 30 in-house data sets. While XGBoost has many adjustable parameters, we can define a set of standard par… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

5
251
0
3

Year Published

2017
2017
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 387 publications
(259 citation statements)
references
References 20 publications
5
251
0
3
Order By: Relevance
“…We build an individual model for each cell line using the popular Random Forest (RF) algorithm 40 . We also build a second model per cell line using XGBoost (XGB for short) 41 , a recent machine learning method that has helped to win numerous Kaggle competitions 41 as well as to generate highly predictive QSAR models 42 . We validate these models for commonly-encountered prediction scenarios: e.g.…”
Section: Introductionmentioning
confidence: 99%
“…We build an individual model for each cell line using the popular Random Forest (RF) algorithm 40 . We also build a second model per cell line using XGBoost (XGB for short) 41 , a recent machine learning method that has helped to win numerous Kaggle competitions 41 as well as to generate highly predictive QSAR models 42 . We validate these models for commonly-encountered prediction scenarios: e.g.…”
Section: Introductionmentioning
confidence: 99%
“…XGBoost has been tested in a series of datasets for QSAR modelling, achieving high accuracy and requiring much less computation time than deep neural nets 63 . There are several adjustable parameters in XGBoost.…”
Section: Methodsmentioning
confidence: 99%
“…RF can also handle both categorical and continuous variables, which can return the importance of variables and be freely implemented with high quality (Statnikov, Wang, & Aliferis, 2008). In addition, extreme random tree (ERT; Šícho, Kops, Stork, Svozil, & Kirchmair, 2017), AdaBoost (Pérezcastillo et al, 2012), gradient boosting trees (GBT) (Ericksen et al, 2017) and XGBoost (Sheridan, Wei, Liaw, Ma, & Gifford, 2016) models are also widely used in drug design and discovery and achieve favorable outcomes.…”
Section: Introductionmentioning
confidence: 99%