2019
DOI: 10.1002/minf.201800082
|View full text |Cite
|
Sign up to set email alerts
|

PySpark and RDKit: Moving towards Big Data in Cheminformatics

Abstract: The authors present an implementation of the cheminformatics toolkit RDKit in a distributed computing environment, Apache Hadoop. Together with the Apache Spark analytics engine, wrapped by PySpark, resources from commodity scalable hardware can be employed for cheminformatic calculations and query operations with basic knowledge in Python programming and understanding of the resilient distributed datasets (RDD). Three use cases of cheminfomatical computing in Spark on the Hadoop cluster are presented; queryin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
8
2

Relationship

2
8

Authors

Journals

citations
Cited by 75 publications
(46 citation statements)
references
References 10 publications
0
46
0
Order By: Relevance
“…5. All models were trained on an in-house big data server described in [40]. The evaluation of the models was conducted using the root mean squared error (RMSE, Eq.…”
Section: Model Training and Validationmentioning
confidence: 99%
“…5. All models were trained on an in-house big data server described in [40]. The evaluation of the models was conducted using the root mean squared error (RMSE, Eq.…”
Section: Model Training and Validationmentioning
confidence: 99%
“…The sequence- and structure-based descriptors were calculated by the RDKit 101 package that uses Python language, except for the DPC and PseAAC that were calculated using PyBioMed 102 package, and the NetC that was extracted from structures using Biopython package 103 .…”
Section: Methodsmentioning
confidence: 99%
“…One thousand one hundred and fourteen FDA-approved, non-nutraceutical drugs were collected from Drugbank 4.0 [55] and a reference dataset of 294 ligands with biological activity on CES1 (documented Ki) was gathered from Chembl [56]. Conformers for those molecules were either retrieved through PubChem [57], returning the experimental structure conformer if available, or generated using the open source cheminformatics library, RDKit [58].…”
Section: Methodsmentioning
confidence: 99%