2021
DOI: 10.1155/2021/6675279
|View full text |Cite
|
Sign up to set email alerts
|

Handling Imbalance Classification Virtual Screening Big Data Using Machine Learning Algorithms

Abstract: Virtual screening is the most critical process in drug discovery, and it relies on machine learning to facilitate the screening process. It enables the discovery of molecules that bind to a specific protein to form a drug. Despite its benefits, virtual screening generates enormous data and suffers from drawbacks such as high dimensions and imbalance. This paper tackles data imbalance and aims to improve virtual screening accuracy, especially for a minority dataset. For a dataset identified without considering … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(7 citation statements)
references
References 35 publications
0
5
0
Order By: Relevance
“…Previous studies dealing with imbalanced intrusion detection data solutions [30] and [40] used datasets from the KDD website. [41] used CMS Medicare Data, and [31] used the PubChem dataset library. While other studies devoted themselves to research dealing with a specific type of data, as in [33], the histopathological dataset was used and the fault dataset in [3] was used as classification training samples.…”
Section: Types Of Datasets Used In Related Workmentioning
confidence: 99%
“…Previous studies dealing with imbalanced intrusion detection data solutions [30] and [40] used datasets from the KDD website. [41] used CMS Medicare Data, and [31] used the PubChem dataset library. While other studies devoted themselves to research dealing with a specific type of data, as in [33], the histopathological dataset was used and the fault dataset in [3] was used as classification training samples.…”
Section: Types Of Datasets Used In Related Workmentioning
confidence: 99%
“…Furthermore, experimental data sourced from the scientific community might also be biased towards well-studied pathways or chemical classes (composition bias), which can limit the predictive scope, as well as the performance of the resultant model [52] , [51] . For example, the number of enzymatic reactions covered by the Rhea database (https://www.rhea-db.org/statistics) involving macromolecules and polymers is one and two orders of magnitude smaller than small molecule reactions, respectively.…”
Section: Publicly Available Data Sources For Machine Learningmentioning
confidence: 99%
“…The reported results of both methods LC-KNN, and RC-KNN showed better performance when tested on Big datasets. It's also worth highlighting other Big Data classification-related studies, like as [77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,105,108,93,109,110,111,112,113,114,115,114,116,117,118,119,120,121,122,123,124,…”
Section: Literature Reviewmentioning
confidence: 99%