2021
DOI: 10.33774/chemrxiv-2021-xd440
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Splitting chemical structure data sets for federated privacy-preserving machine learning

Abstract: With the increase in applications of machine learning methods in drug design and related fields, the challenge of designing sound test sets becomes more and more prominent. The goal of this challenge is to have a realistic split of chemical structures (compounds) between training, validation and test set such that the performance on the test set is meaningful to infer the performance in a prospective application. This challenge is by its own very interesting and relevant, but is even more complex in a federate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1
1

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 16 publications
0
2
0
Order By: Relevance
“…SparseChem is a package for machine learning models for biochemical applications capable of high-dimensional sparse input. The data were split into five folds (subsets) using locality sensitive hashing on molecular fingerprint features [20]. Three folds were used for training, whereas one was used as test and the other as validation fold.…”
Section: Weighting Based On Fraction Actives or Class Label Balancementioning
confidence: 99%
“…SparseChem is a package for machine learning models for biochemical applications capable of high-dimensional sparse input. The data were split into five folds (subsets) using locality sensitive hashing on molecular fingerprint features [20]. Three folds were used for training, whereas one was used as test and the other as validation fold.…”
Section: Weighting Based On Fraction Actives or Class Label Balancementioning
confidence: 99%
“…SparseChem is a package for machine learning models for biochemical applications capable of high-dimensional sparse input. The data were split into 5 folds (subsets) using locality sensitive hashing on molecular fingerprint features [18]. Three folds were used for training, whereas one was used as test and the other as validation fold.…”
Section: Trainingmentioning
confidence: 99%