2019
DOI: 10.1007/s11063-019-10116-7
|View full text |Cite
|
Sign up to set email alerts
|

Second Order Training and Sizing for the Multilayer Perceptron

Abstract: Training algorithms for deep learning have recently been proposed with notable success, beating the state-of-the-art in certain areas like audio, speech and language processing. The key role is played by learning multiple levels of abstractions in a deep architecture. However, searching the parameters space in a deep architecture is a difficult task. By exploiting the greedy layer-wise unsupervised training strategy of deep architecture, the network parameters are initialized near a good local minima. However,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(2 citation statements)
references
References 115 publications
0
2
0
Order By: Relevance
“…These methods typically involve using a constructive algorithm to grow the neural architecture first, then prune the subsequent architecture, or simultaneously grow and prune during the learning process [25]. While using a hybrid approach is appealing and has had great success [16,[26][27][28][29][30][31]; the focus of the present article is on constructive algorithms.…”
Section: Introductionmentioning
confidence: 99%
“…These methods typically involve using a constructive algorithm to grow the neural architecture first, then prune the subsequent architecture, or simultaneously grow and prune during the learning process [25]. While using a hybrid approach is appealing and has had great success [16,[26][27][28][29][30][31]; the focus of the present article is on constructive algorithms.…”
Section: Introductionmentioning
confidence: 99%
“…In the GAN network [23], the KL divergence is substituted into the objective function to solve the minimax game problem. Mapping to the tracking problem, we build a loss function based on minimizing the information loss of the KL divergence, and the model can be optimized and updated by solving the minimum value of the loss function [24]. In [25], the KL divergence is minimized to train the regression network.…”
Section: Introductionmentioning
confidence: 99%