2014
DOI: 10.1021/ci400737s
|View full text |Cite
|
Sign up to set email alerts
|

QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem

Abstract: Many of the structures in PubChem are annotated with activities determined in high-throughput screening (HTS) assays. Because of the nature of these assays, the activity data are typically strongly imbalanced, with a small number of active compounds contrasting with a very large number of inactive compounds. We have used several such imbalanced PubChem HTS assays to test and develop strategies to efficiently build robust QSAR models from imbalanced data sets. Different descriptor types [Quantitative Neighborho… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
108
0
1

Year Published

2015
2015
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 104 publications
(112 citation statements)
references
References 26 publications
3
108
0
1
Order By: Relevance
“…Consequently, when trying to predict a minority class in an unbalanced data set, machine learning methods are prone to assign most samples to the majority class, resulting in a large number of erroneous predictions for minority class. 36 …”
Section: Resultsmentioning
confidence: 99%
“…Consequently, when trying to predict a minority class in an unbalanced data set, machine learning methods are prone to assign most samples to the majority class, resulting in a large number of erroneous predictions for minority class. 36 …”
Section: Resultsmentioning
confidence: 99%
“…This imbalanced nature of HTS data presents a great challenge for developing an accurate prediction model from them. 91-94 This issue may be addressed by generating a balanced data set through resampling of the original HTS data set. Several studies 33,91-94 have applied different resampling techniques for analysis of HTS data in PubChem.…”
Section: Dealing With Data Imbalance Issues In Pubchem Datamentioning
confidence: 99%
“…91-94 This issue may be addressed by generating a balanced data set through resampling of the original HTS data set. Several studies 33,91-94 have applied different resampling techniques for analysis of HTS data in PubChem. They are broadly categorized into two classes: undersampling of the majority class (inactive compounds) and oversampling of the minority class (active compounds).…”
Section: Dealing With Data Imbalance Issues In Pubchem Datamentioning
confidence: 99%
“…Three-dimensional conformers of most chemical molecules are also available [21]. PubChem data have been used in various researches including Quantitative Structure-Activity relationship (QSAR) studies [21], [22].…”
Section: Rungsang Nakrumpaimentioning
confidence: 99%