2018
DOI: 10.1007/978-3-319-89743-1_22
|View full text |Cite
|
Sign up to set email alerts
|

Ensemble Learning for Large Scale Virtual Screening on Apache Spark

Abstract: Virtual screening (VS) is an in-silico tool for drug discovery that aims to identify the candidate drugs through computational techniques by screening large libraries of small molecules. Various ligand and structure-based virtual screening approaches have been proposed in the last decades. Machine learning (ML) techniques have been widely applied in drug discovery and development process, predominantly in ligand based virtual screening approaches. Ensemble learning is a very common paradigm in ML field, where … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 20 publications
0
5
0
1
Order By: Relevance
“…Some papers [27,28] used deep learning to predict drug activity. This paper also investigates other works that used big data platforms to predict activity in virtual screening [31,32].…”
Section: 3results and Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Some papers [27,28] used deep learning to predict drug activity. This paper also investigates other works that used big data platforms to predict activity in virtual screening [31,32].…”
Section: 3results and Discussionmentioning
confidence: 99%
“…ET algorithm gives best accuracy results of 90% and precision of 0.86, but it takes 75 seconds. Apache Spark is used, as in [32], for different big data analytics methods (MLP, DT, NB, SVM and ET). ET gives the highest accuracy (94%) and precision (0.93), but it takes longer time than DT (Table 6).…”
Section: 3results and Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Let M be the set of remaining local models. For each data record in the input dataset D, it produces the predicted label by taking the majority voting [31] of the local models in M.…”
Section: Machine Learning Algorithms Under Logomentioning
confidence: 99%
“…Although resampling methods are usually used to solve problems with imbalances in the class, there is little defined strategy to identify the acceptable class distribution for a particular dataset [18]. As a result, the optimal class distribution differs from one dataset to another.…”
Section: Introductionmentioning
confidence: 99%