2021
DOI: 10.26434/chemrxiv-2021-zv6f1-v2
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Perplexity-based molecule ranking and bias estimation of chemical language models

Abstract: Chemical language models (CLMs) can be employed to design molecules with desired properties. CLMs generate new chemical structures in the form of textual representations, such as the simplified molecular input line entry systems (SMILES) strings, in a rule-free manner. However, the quality of these de novo generated molecules is difficult to assess a priori. In this study, we apply the perplexity metric to determine the degree to which the molecules generated by a CLM match the desired design objectives. This … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
12
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
3
1

Relationship

3
1

Authors

Journals

citations
Cited by 4 publications
(12 citation statements)
references
References 0 publications
0
12
0
Order By: Relevance
“…Transfer learning 77 -applying a models' previously learned knowledge to a new, related problem by further training -was applied to the LSTM and transformer models in agreement with previous studies 64,66,78,79 . In a preliminary analysis, we explored transfer learning approaches for graph neural networks using self-supervision (context prediction 80 , infomax 81 , edge prediction 82 , and masking 80 ).…”
Section: Deep Learning Methodsmentioning
confidence: 85%
See 3 more Smart Citations
“…Transfer learning 77 -applying a models' previously learned knowledge to a new, related problem by further training -was applied to the LSTM and transformer models in agreement with previous studies 64,66,78,79 . In a preliminary analysis, we explored transfer learning approaches for graph neural networks using self-supervision (context prediction 80 , infomax 81 , edge prediction 82 , and masking 80 ).…”
Section: Deep Learning Methodsmentioning
confidence: 85%
“…LSTMs -a type of recurrent neural network -can learn from string sequences by keeping track of long-range dependences. As in a previous study 64 , LSTM models were pre-trained on SMILES obtained by merging all training sets with no repetitions (36,281 molecules), using next-character prediction, before applying transfer learning for bioactivity prediction. 3.…”
Section: Smiles-based Deep Learning Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…LSTMs -a type of recurrent neural network -can learn from string sequences by keeping track of long-range dependences. As in a previous study 68 , LSTM models were pre-trained on SMILES obtained by merging all training sets with no repetitions (36,281 molecules), using next-character prediction, before applying transfer learning for bioactivity prediction. 3.…”
Section: Smiles-based Deep Learning Methodsmentioning
confidence: 99%