2015 IEEE International Conference on Big Data (Big Data) 2015
DOI: 10.1109/bigdata.2015.7364091
|View full text |Cite
|
Sign up to set email alerts
|

Large-scale learning with AdaGrad on Spark

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0
2

Year Published

2017
2017
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 46 publications
(16 citation statements)
references
References 2 publications
0
13
0
2
Order By: Relevance
“…[31] showed that existing CNNs might be difficult to handle the image regression, and unsatisfied results were also The optimization algorithm utilized in the CIC is Adaptive Moment Estimation (Adam) [35] optimizer which is essentially Root Mean Square Prop (RMSProp) [36] with momentum factor. The Adam integrates the advantages of Adaptive Gradient (AdaGrad) [37] and RMSProp, and spends lower computational cost. Furthermore, it performs well for most nonconvex optimization, large data sets, and high-dimensional space.…”
Section: Image-based Reconstruction Of the Heat Transfer Problemmentioning
confidence: 99%
“…[31] showed that existing CNNs might be difficult to handle the image regression, and unsatisfied results were also The optimization algorithm utilized in the CIC is Adaptive Moment Estimation (Adam) [35] optimizer which is essentially Root Mean Square Prop (RMSProp) [36] with momentum factor. The Adam integrates the advantages of Adaptive Gradient (AdaGrad) [37] and RMSProp, and spends lower computational cost. Furthermore, it performs well for most nonconvex optimization, large data sets, and high-dimensional space.…”
Section: Image-based Reconstruction Of the Heat Transfer Problemmentioning
confidence: 99%
“…The model with the least reconstruction error is selected to obtain sensor features. In order to minimize the reconstruction error, ADAGRAD is used as the optimization algorithm (Dean et al 2012;Hadgu, Nigam & Diaz-Aviles 2015). The architecture finalized using the cross-validation procedure is presented in Table 1.…”
Section: Denoising Autoencoder Architecture Developmentmentioning
confidence: 99%
“…The network is then trained using a gradient based optimization algorithm. There have been many gradient-based algorithms employed in machine learning including stochastic gradient descent [33][34][35], stochastic gradient descent with Nesterov Momentum [33][34][35][36], ADAGRAD [34,35,37], and ADAM [34,35,38]. Based on empirical findings, ADAGRAD and ADAM are widely used in practice because of their speed and accuracy [13,17,18,31], and because of this, ADA-GRAD has been implemented in this work.…”
Section: Proposed Architecturementioning
confidence: 99%
“…The feature-learning layer acts as an Transactions of the ASME input to a fully connected network, where for both models two hidden layers are used with 2048 hidden units in each layer. The output layer is modeled as a softmax layer [16,18,23,29,42,43] and the cross-entropy loss function [16,18,23,29,42,43] is minimized using the ADAGRAD [34,35,37] optimization algorithm with a learning rate of 0.005. These hyper-parameters were obtained using the training procedure shown in Fig.…”
Section: Accuracymentioning
confidence: 99%