2022
DOI: 10.48550/arxiv.2205.08020
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Partial Product Aware Machine Learning on DNA-Encoded Libraries

Abstract: DNA encoded libraries (DELs) are used for rapid large-scale screening of small molecules against a protein target. These combinatorial libraries are built through several cycles of chemistry and DNA ligation, producing large sets of DNA-tagged molecules. Training machine learning models on DEL data has been shown to be effective at predicting molecules of interest dissimilar from those in the original DEL. Machine learning chemical property prediction approaches rely on the assumption that the property of inte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
14
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(14 citation statements)
references
References 8 publications
0
14
0
Order By: Relevance
“…68 Thus, future work will include modeling DEL datasets with larger scale and higher chemical diversity and adapting more advanced machine learning models that can take truncated and byproducts of DELs in consideration. 51,52 In summary, we show that the approach of ECFP-based DNN model with MAP loss function can be applied to effectively process and denoise cell-based DEL selection datasets, and the method may also be suitable for other types of complex biological targets, 11 and this approach also demonstrated its potential for in silico screening of chemical libraries.…”
Section: Discussionmentioning
confidence: 78%
See 4 more Smart Citations
“…68 Thus, future work will include modeling DEL datasets with larger scale and higher chemical diversity and adapting more advanced machine learning models that can take truncated and byproducts of DELs in consideration. 51,52 In summary, we show that the approach of ECFP-based DNN model with MAP loss function can be applied to effectively process and denoise cell-based DEL selection datasets, and the method may also be suitable for other types of complex biological targets, 11 and this approach also demonstrated its potential for in silico screening of chemical libraries.…”
Section: Discussionmentioning
confidence: 78%
“…103 There are several aspects that warrants further development. First, truncated and byproducts are inevitable in DELs, 51,52,63 and they are not considered in the MAP metric or MAP model; second, CAS-DEL only contains the tripeptide scaffold and has limited chemical diversity, 81 which makes it difficult to be generalized to unknown datasets and probably has led to the relatively low recall rate in our virtual screening study; third, the framework used in the project is a traditional fully connected network, a different and more complex machine learning method may lead to better performance. 68 Thus, future work will include modeling DEL datasets with larger scale and higher chemical diversity and adapting more advanced machine learning models that can take truncated and byproducts of DELs in consideration.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations