A new approach for the vanishing gradient problem on sigmoid activation

Roodschild, Matías; Sardiñas, Jorge Gotay; Will, Adrián

doi:10.1007/s13748-020-00218-y

Cited by 89 publications

(44 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…sigmoid and hyperbolic tangent activation functions). However, stacked neuron layers with these activation functions are often suffering from the vanishing and exploding gradient problems and, hence, not suitable for our purpose [38,39]. Meanwhile, the Rectified Linear Unit (ReLU) activation function is usually deployed instead as the default choice for multilayer perceptron and convolutional neural networks due to the generally good performance and faster learning capacity.…”

Section: Neural Network Design For Sobolev Training Of a Smooth Scalar Functional With Physical Constraintsmentioning

confidence: 99%

Sobolev training of thermodynamic-informed neural networks for interpretable elasto-plasticity models with level set hardening

Vlassis

Sun

2021

Computer Methods in Applied Mechanics and Engineering

132

View full text Add to dashboard Cite

We introduce a deep learning framework designed to train smoothed elastoplasticity models with interpretable components, such as the stored elastic energy function, yield surface, and plastic flow that evolve based on a set of deep neural network predictions. By recasting the yield function as an evolving level set, we introduce a deep learning approach to deduce the solutions of the Hamilton-Jacobi equation that governs the hardening/softening mechanism. This machine learning hardening law may recover any classical hand-crafted hardening rules and discover new mechanisms that are either unbeknownst or difficult to express with mathematical expressions. Leveraging Sobolev training to gain control over the derivatives of the learned functions, the resultant machine learning elastoplasticity models are thermodynamically consistent, interpretable, while exhibiting excellent learning capacity. Using a 3D FFT solver to create a polycrystal database, numerical experiments are conducted and the implementations of each component of the models are individually verified. Our numerical experiments reveal that this new approach provides more robust and accurate forward predictions of cyclic stress paths than those obtained from black-box deep neural network models such as the recurrent neural network, the 1D convolutional neural network, and the multi-step feed-forward model. c

show abstract

Section: Neural Network Design For Sobolev Training Of a Smooth Scalar Functional With Physical Constraintsmentioning

confidence: 99%

Sobolev training of thermodynamic-informed neural networks for interpretable elasto-plasticity models with level set hardening

Vlassis

Sun

2021

Computer Methods in Applied Mechanics and Engineering

132

View full text Add to dashboard Cite

show abstract

“…Firstly, the increase in the number of layers causes the increase of parameters in the network, which intensifies the storage difficulty and computational complexity of the model. Secondly, the increase in model depth may cause the risk of vanishing gradient [ 20 ], resulting in the inability to effectively update the parameters of the shallow convolution kernel. Thirdly, more training data is needed to prevent overfitting of the model [ 21 ].…”

Section: Problem Definitionmentioning

confidence: 99%

KecNet: A Light Neural Network for Arrhythmia Classification Based on Knowledge Reinforcement

Gao

et al. 2021

Journal of Healthcare Engineering

View full text Add to dashboard Cite

Acquiring electrocardiographic (ECG) signals and performing arrhythmia classification in mobile device scenarios have the advantages of short response time, almost no network bandwidth consumption, and human resource savings. In recent years, deep neural networks have become a popular method to efficiently and accurately simulate nonlinear patterns of ECG data in a data-driven manner but require more resources. Therefore, it is crucial to design deep learning (DL) algorithms that are more suitable for resource-constrained mobile devices. In this paper, KecNet, a lightweight neural network construction scheme based on domain knowledge, is proposed to model ECG data by effectively leveraging signal analysis and medical knowledge. To evaluate the performance of KecNet, we use the Association for the Advancement of Medical Instrumentation (AAMI) protocol and the MIT-BIH arrhythmia database to classify five arrhythmia categories. The result shows that the ACC, SEN, and PRE achieve 99.31%, 99.45%, and 98.78%, respectively. In addition, it also possesses high robustness to noisy environments, low memory usage, and physical interpretability advantages. Benefiting from these advantages, KecNet can be applied in practice, especially wearable and lightweight mobile devices for arrhythmia classification.

show abstract

“…The motivation for using neural networks in this work, in particular LSTM, is due to its proven efficiency in Sentiment Analysis. Although RNNs have proved to be successful in several tasks, it is well-known that it suffers from a significant drawback, namely the vanishing-gradient problem [61], making it difficult for the network to learn longdistance dependencies. The use of memory units in the network helps to overcome this drawback.…”

Section: Long-short-term Memory (Lstm)mentioning

confidence: 99%

Sentiment Analysis in Twitter Based on Knowledge Graph and Deep Learning Classification

2021

View full text Add to dashboard Cite

The traditional way to address the problem of sentiment classification is based on machine learning techniques; however, these models are not able to grasp all the richness of the text that comes from different social media, personal web pages, blogs, etc., ignoring the semantic of the text. Knowledge graphs give a way to extract structured knowledge from images and texts in order to facilitate their semantic analysis. This work proposes a new hybrid approach for Sentiment Analysis based on Knowledge Graphs and Deep Learning techniques to identify the sentiment polarity (positive or negative) in short documents, such as posts on Twitter. In this proposal, tweets are represented as graphs; then, graph similarity metrics and a Deep Learning classification algorithm are applied to produce sentiment predictions. This approach facilitates the traceability and interpretability of the classification results, thanks to the integration of the Local Interpretable Model-agnostic Explanations (LIME) model at the end of the pipeline. LIME allows raising trust in predictive models, since the model is not a black box anymore. Uncovering the black box allows understanding and interpreting how the network could distinguish between sentiment polarities. Each phase of the proposed approach conformed by pre-processing, graph construction, dimensionality reduction, graph similarity, sentiment prediction, and interpretability steps is described. The proposal is compared with character n-gram embeddings-based Deep Learning models to perform Sentiment Analysis. Results show that the proposal is able to outperforms classical n-gram models, with a recall up to 89% and F1-score of 88%.

show abstract

A new approach for the vanishing gradient problem on sigmoid activation

Cited by 89 publications

References 13 publications

Sobolev training of thermodynamic-informed neural networks for interpretable elasto-plasticity models with level set hardening

Sobolev training of thermodynamic-informed neural networks for interpretable elasto-plasticity models with level set hardening

KecNet: A Light Neural Network for Arrhythmia Classification Based on Knowledge Reinforcement

Sentiment Analysis in Twitter Based on Knowledge Graph and Deep Learning Classification

Contact Info

Product

Resources

About