2021
DOI: 10.48550/arxiv.2110.00987
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Motif-based Graph Self-Supervised Learning for Molecular Property Prediction

Abstract: Predicting molecular properties with data-driven methods has drawn much attention in recent years. Particularly, Graph Neural Networks (GNNs) have demonstrated remarkable success in various molecular generation and prediction tasks. In cases where labeled data is scarce, GNNs can be pre-trained on unlabeled molecular data to first learn the general semantic and structural information before being finetuned for specific tasks. However, most existing self-supervised pre-training frameworks for GNNs only focus on… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 18 publications
0
5
0
Order By: Relevance
“…SSL was initially applied in computer vision and NLP , that requires large data sets for accurate representation learning. As SSL advances, it has been utilized to predict molecular properties. , For example, it can extract features from unlabeled molecular data. , Likewise, SSL can also extract features from genome data to predict genome function. Recently, many studies have shown that SSL can alleviate the problem of few samples or insufficient supervised information, making it widely applicable in image classification, , recommender systems, protein analysis and design, speech recognition, and other fields.…”
Section: Methods For Small Molecular Data Challengesmentioning
confidence: 99%
See 2 more Smart Citations
“…SSL was initially applied in computer vision and NLP , that requires large data sets for accurate representation learning. As SSL advances, it has been utilized to predict molecular properties. , For example, it can extract features from unlabeled molecular data. , Likewise, SSL can also extract features from genome data to predict genome function. Recently, many studies have shown that SSL can alleviate the problem of few samples or insufficient supervised information, making it widely applicable in image classification, , recommender systems, protein analysis and design, speech recognition, and other fields.…”
Section: Methods For Small Molecular Data Challengesmentioning
confidence: 99%
“…325,326 For example, it can extract features from unlabeled molecular data. 28,327 Likewise, SSL can also extract features from genome data 328 to predict genome function. Recently, many studies have shown that SSL can alleviate the problem of few samples or insufficient supervised information, making it widely applicable in image classification, 329,330 recommender systems, 331 protein analysis and design, 332 speech recognition, 333 and other fields.…”
Section: Transformersmentioning
confidence: 99%
See 1 more Smart Citation
“…Motifs have been proven to benefit many tasks from exploratory analysis to transfer learning. Various algorithms have been proposed to exploit them for contrastive learning (Zhang et al 2020), selfsupervised pretraining (Zhang et al 2021), generation (Jin, Barzilay, and Jaakkola 2020), protein design (Li et al 2022) and drug-drug interaction prediction (Huang et al 2020). But none of them take advantage of motifs to build a heterogeneous graph for molecular property prediction.…”
Section: Related Workmentioning
confidence: 99%
“…from publicly available database like PubChem or molecular dynamics simulations. In order to utilize the availability of these unlabelled data and overcome the scarcity of labelled data, recently various self-supervised pre-training strategies have been devised for GNNs, and have been successfully demonstrated in social network and biological domains [65][66][67][68][69]. However its applications on quantum mechanical properties have been limited and only available on simple small molecules like those in the QM7 and QM8 datasets [68].…”
mentioning
confidence: 99%