2020 IEEE International Conference on Big Data (Big Data) 2020
DOI: 10.1109/bigdata50022.2020.9378063
|View full text |Cite
|
Sign up to set email alerts
|

Optimization of Graph Neural Networks with Natural Gradient Descent

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 20 publications
(3 citation statements)
references
References 10 publications
0
3
0
Order By: Relevance
“…Gradient descent is used to train deep networks in a bid to reduce a predefined cost function of the output layer and is shown as the negative log-likelihood function. Among a plentitude of deep network models, the type known as deep [24]. Since both kinds, i.e., DBNs and RBMs, do not work with the 2D structure of images to extract a given feature, the weights that are required need to be learned one by one for each pixel.…”
Section: Proposed Deep Belief Learning Networkmentioning
confidence: 99%
“…Gradient descent is used to train deep networks in a bid to reduce a predefined cost function of the output layer and is shown as the negative log-likelihood function. Among a plentitude of deep network models, the type known as deep [24]. Since both kinds, i.e., DBNs and RBMs, do not work with the 2D structure of images to extract a given feature, the weights that are required need to be learned one by one for each pixel.…”
Section: Proposed Deep Belief Learning Networkmentioning
confidence: 99%
“…NGD transforms gradients into so-called natural gradients that have proved to be much faster compared to the stochastic gradient descent (SGD). Recently, the work in [15] used NGD for a semi-supervised classification task in GCN, and it showed encouraging results in both accuracy and convergence speed on some benchmark Algorithm 1 Preconditioning using NGD Input: Gradient of parameters ∇W l for l = 1, ..., m, adjacency matrix A, degree matrix D, training mask z, regularization hyper-parameters λ,ǫ 1: Derive the numbers of labeled and unlabeled vertices via n = (z) and n = dim(z). And let [∆a ij ] represent the entry of ∆A.…”
Section: Graph Reconstructionmentioning
confidence: 99%
“…However, any extra information about gradients is often impossible or hard to obtain. Motivated by NGD, we introduce a preconditioning algorithm that uses the second moment of gradient to approximate the parameters' Fisher information matrix in the prediction distribution [15].…”
mentioning
confidence: 99%