2015 IEEE Trustcom/BigDataSE/Ispa 2015
DOI: 10.1109/trustcom.2015.581
|View full text |Cite
|
Sign up to set email alerts
|

Compounds Activity Prediction in Large Imbalanced Datasets with Substructural Relations Fingerprint and EEM

Abstract: Abstract-Modern drug design procedures involve the process of virtual screening, a highly efficient filtering step used for maximizing the efficiency of the preselection of compounds which are valuable drug candidates. Recent advances in introduction of machine learning models to this process can lead to significant increase in the overall quality of the drug designing pipeline.Unfortunately, for many proteins it is still extremely hard to come up with a valid statistical model. It is a consequence of huge cla… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
13
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(13 citation statements)
references
References 13 publications
0
13
0
Order By: Relevance
“…However, there is a lack of studies on the classification of extremely imbalanced datasets. In real-life applications such as fraud detection [61] or cheminformatics [12] we may deal with problems with imbalance ratio ranging from 1:1000 up to 1:5000. This poses new challenges to data preprocessing and classification algorithms, as they must be adjusted to such extreme scenarios.…”
Section: Extreme Class Imbalancementioning
confidence: 99%
“…However, there is a lack of studies on the classification of extremely imbalanced datasets. In real-life applications such as fraud detection [61] or cheminformatics [12] we may deal with problems with imbalance ratio ranging from 1:1000 up to 1:5000. This poses new challenges to data preprocessing and classification algorithms, as they must be adjusted to such extreme scenarios.…”
Section: Extreme Class Imbalancementioning
confidence: 99%
“…Structure–activity relationship (SAR) has been frequently used to predict the biological activities of chemicals from their molecular structures. One of the major challenges in SAR-based chemical classification or drug discovery is the extreme imbalance between active and inactive chemicals [ 1 ]. Despite the existence of as many as 10 7 commercially available molecules [ 2 ], there is almost always a skew in the distribution of molecules across the bioactivity landscape or toxicity classes.…”
Section: Introductionmentioning
confidence: 99%
“…This leads to low sensitivity and precision for the minority class [ 6 ], even though the minority class is usually of greater importance than the majority class [ 7 , 8 ]. In fields such as toxicology and disease diagnosis, bias towards the majority class may result in a higher rate of false negative predictions [ 1 ].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The problem of data imbalance is ubiquitous in practical applications, affecting domains such as cancer malignancy grading [8], industrial systems monitoring [9], fraud detection [10], behavioral analysis [11] and cheminformatics [12]. Furthermore, data imbalance typically leads to the more costly type of error, for instance by inducing false negatives in the medical problem domain.…”
Section: Introductionmentioning
confidence: 99%