2022
DOI: 10.1093/bib/bbac099
|View full text |Cite
|
Sign up to set email alerts
|

An efficient curriculum learning-based strategy for molecular graph learning

Abstract: Computational methods have been widely applied to resolve various core issues in drug discovery, such as molecular property prediction. In recent years, a data-driven computational method-deep learning had achieved a number of impressive successes in various domains. In drug discovery, graph neural networks (GNNs) take molecular graph data as input and learn graph-level representations in non-Euclidean space. An enormous amount of well-performed GNNs have been proposed for molecular graph learning. Meanwhile, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 14 publications
(7 citation statements)
references
References 68 publications
0
7
0
Order By: Relevance
“…First, from the data quality control aspect, some of the sub-datasets and targets in our integrated benchmark dataset should be filtered as all tested LBVS methods could not give reliable predictions on them, but they are still retained in the current version; Second, from the model training aspect, a predictable optimization is to pre-train EGNN-based models using molecular conformation and finetune them for the downstream bioactivity prediction [ 57 , 58 , 59 ]. Also, improving the GNN model training period with chemical domain knowledge insights is another promising strategy [ 60 , 61 ]. Thirdly, based on the architecture design aspect, more advanced GNN backbones that use 3D graphs as inputs should be adopted for conformer-based representation learning, such as SchNet [ 62 ], GemNet [ 63 ], PaiNN [ 64 ], etc.…”
Section: Discussionmentioning
confidence: 99%
“…First, from the data quality control aspect, some of the sub-datasets and targets in our integrated benchmark dataset should be filtered as all tested LBVS methods could not give reliable predictions on them, but they are still retained in the current version; Second, from the model training aspect, a predictable optimization is to pre-train EGNN-based models using molecular conformation and finetune them for the downstream bioactivity prediction [ 57 , 58 , 59 ]. Also, improving the GNN model training period with chemical domain knowledge insights is another promising strategy [ 60 , 61 ]. Thirdly, based on the architecture design aspect, more advanced GNN backbones that use 3D graphs as inputs should be adopted for conformer-based representation learning, such as SchNet [ 62 ], GemNet [ 63 ], PaiNN [ 64 ], etc.…”
Section: Discussionmentioning
confidence: 99%
“…In the area of polymer generation, hypergraph-based methods have emerged as a promising representation. Guo et al devised a learnable graph grammar approach to represent the chain structure, their inherent scale, and structural complexity [135,136]. A hyperedge can join all nodes in a ring structure or only two nodes in a polymer chain [135].…”
Section: Representing One-dimensional (1d) Materialsmentioning
confidence: 99%
“…Guo et al devised a learnable graph grammar approach to represent the chain structure, their inherent scale, and structural complexity [135,136]. A hyperedge can join all nodes in a ring structure or only two nodes in a polymer chain [135]. A bottom-up search builds up production rules from the finest-grained level.…”
Section: Representing One-dimensional (1d) Materialsmentioning
confidence: 99%
“…The development of reliable machine learning algorithms has been limited due to the lack of standard benchmark datasets to compare the efficacy of the methods proposed (Jain and Nicholls, 2008). Furthermore, machine learning in chemistry compared with other areas such as computer speech and vision has a main disadvantage, the data recovery (Wu et al, 2018;Guo et al, 2022), because of measuring chemical properties often requires specialized instruments; as a result, datasets with experimentally determined results are small and often not sufficiently large to cover the high-demanding needs of machine-learning tasks (Wu et al, 2018). Another challenge is data splitting (the way in which datasets are split into training data and testing data).…”
Section: Benchmark Databasesmentioning
confidence: 99%