2015
DOI: 10.1007/s11030-015-9649-4
|View full text |Cite
|
Sign up to set email alerts
|

Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling

Abstract: In many absorption, distribution, metabolism, and excretion (ADME) modeling problems, imbalanced data could negatively affect classification performance of machine learning algorithms. Solutions for handling imbalanced dataset have been proposed, but their application for ADME modeling tasks is underexplored. In this paper, various strategies including cost-sensitive learning and resampling methods were studied to tackle the moderate imbalance problem of a large Caco-2 cell permeability database. Simple physic… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(6 citation statements)
references
References 42 publications
0
6
0
Order By: Relevance
“…The present study was motivated by the scarcity of reported efforts in the application of the above-mentioned methods to the SAR-based chemical classification domain. We conducted a literature survey which only identified a few studies in this domain where cost-sensitive learning [ 28 , 29 ], resampling [ 29 , 30 ], conformal prediction [ 18 ] and extreme entropy machines [ 1 , 31 ] were employed to specifically deal with data imbalance. Although predictive modeling was improved for certain datasets, a consistent performance enhancement was not observed as a result of resampling and algorithm modification.…”
Section: Introductionmentioning
confidence: 99%
“…The present study was motivated by the scarcity of reported efforts in the application of the above-mentioned methods to the SAR-based chemical classification domain. We conducted a literature survey which only identified a few studies in this domain where cost-sensitive learning [ 28 , 29 ], resampling [ 29 , 30 ], conformal prediction [ 18 ] and extreme entropy machines [ 1 , 31 ] were employed to specifically deal with data imbalance. Although predictive modeling was improved for certain datasets, a consistent performance enhancement was not observed as a result of resampling and algorithm modification.…”
Section: Introductionmentioning
confidence: 99%
“…The dataset consisted of 1043 compounds with human f a , ,,,,,,, 125 with rat f a , ,,,,, and 484 compounds with Caco-2 P app ,,,,,,,,,, ,,, and human f a . When multiple permeability values were found for a single compound from more than one source, the geometric mean of the values was used as the point estimate.…”
Section: Experimental Sectionmentioning
confidence: 99%
“…PBCS was applied to evaluate the absorption profile. [34,45] In addition, the overall ADMET score compliance, especially Lipinski's RO5, bioavailability score, and characteristic toxicity were taken into account as the consensus criteria of "drug-likeness" for synthesized compounds. [31,46,47]…”
Section: Physicochemical and Admet Profilingmentioning
confidence: 99%