2022
DOI: 10.1021/acs.jcim.2c00588
|View full text |Cite
|
Sign up to set email alerts
|

Self-Supervised Molecular Pretraining Strategy for Low-Resource Reaction Prediction Scenarios

Abstract: In the face of low-resource reaction training samples, we construct a chemical platform for addressing small-scale reaction prediction problems. Using a self-supervised pretraining strategy called MAsked Sequence to Sequence (MASS), the Transformer model can absorb the chemical information of about 1 billion molecules and then fine-tune on a small-scale reaction prediction. To further strengthen the predictive performance of our model, we combine MASS with the reaction transfer learning strategy. Here, we show… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 43 publications
0
1
0
Order By: Relevance
“…Some studies utilized SMILES as input and employed well-established models from the field of natural language processing (NLP) to encode the SMILES notations of chemical reactions into continuous vectors. These studies employed either pre-trained Transformer-based models [ 19 21 ] or Recurrent Neural Network (RNN) models [ 22 , 23 ] on large-scale datasets, and fine-tuned their models [ 3 , 24 ] with downstream tasks to capture task-specific representations of chemical reactions. Other studies incorporated molecular graph structures to represent chemical reactions.…”
Section: Introductionmentioning
confidence: 99%
“…Some studies utilized SMILES as input and employed well-established models from the field of natural language processing (NLP) to encode the SMILES notations of chemical reactions into continuous vectors. These studies employed either pre-trained Transformer-based models [ 19 21 ] or Recurrent Neural Network (RNN) models [ 22 , 23 ] on large-scale datasets, and fine-tuned their models [ 3 , 24 ] with downstream tasks to capture task-specific representations of chemical reactions. Other studies incorporated molecular graph structures to represent chemical reactions.…”
Section: Introductionmentioning
confidence: 99%