2022
DOI: 10.1186/s13321-022-00657-w
|View full text |Cite
|
Sign up to set email alerts
|

Tuning gradient boosting for imbalanced bioassay modelling with custom loss functions

Abstract: While in the last years there has been a dramatic increase in the number of available bioassay datasets, many of them suffer from extremely imbalanced distribution between active and inactive compounds. Thus, there is an urgent need for novel approaches to tackle class imbalance in drug discovery. Inspired by recent advances in computer vision, we investigated a panel of alternative loss functions for imbalanced classification in the context of Gradient Boosting and benchmarked them on six datasets from public… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
9
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(9 citation statements)
references
References 47 publications
0
9
0
Order By: Relevance
“…This process has been encouraged for research papers submitted to J. Cheminf. as demonstrated by these examples [ 22 , 23 ].…”
mentioning
confidence: 85%
“…This process has been encouraged for research papers submitted to J. Cheminf. as demonstrated by these examples [ 22 , 23 ].…”
mentioning
confidence: 85%
“…23 As a result, the imbalance of chemical datasets can present challenges for modeling and analysis, and it is important to use suitable techniques and methods to handle imbalanced data. 17,18,22,[24][25][26][27][28] Research in the field of imbalanced learning has provided better data analysis techniques to address imbalanced problems in these areas. In the imbalanced regression prediction modeling scenario, there are two characteristics 3 : (i) the skewed distribution of continuous response variables and (ii) a domain preference for underrepresented instances.…”
Section: Introductionmentioning
confidence: 99%
“…The majority of in silico models that predict toxicity profiles and bioactivities were dependent on imbalanced datasets 23 . As a result, the imbalance of chemical datasets can present challenges for modeling and analysis, and it is important to use suitable techniques and methods to handle imbalanced data 17,18,22,24–28 . Research in the field of imbalanced learning has provided better data analysis techniques to address imbalanced problems in these areas.…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, there is an urgent need for new methods to address this problem of imbalance. [13][14][15][16] For example, cancer lectins are those that firmly recognize specific types of proteins, which initiate the tolerance, development, metastasis, and spread of cancer cells. We need to separate cancer lectins and noncancer lectins to better study cancer treatment.…”
Section: Introductionmentioning
confidence: 99%
“…In the past few years, the number of available bioassay datasets has sharply increased, but many of them have extremely imbalanced distributions between active and inactive compounds or imbalance of sample number between the new and the previous experimental data. Therefore, there is an urgent need for new methods to address this problem of imbalance 13–16 . For example, cancer lectins are those that firmly recognize specific types of proteins, which initiate the tolerance, development, metastasis, and spread of cancer cells.…”
Section: Introductionmentioning
confidence: 99%