Review and Comparison of Commonly Used Activation Functions for Deep Neural Networks

Szandała, Tomasz

doi:10.1007/978-981-15-5495-7_11

Cited by 240 publications

(127 citation statements)

References 28 publications

Supporting

Mentioning

122

Contrasting

Unclassified

Order By: Relevance

“…The hyperbolic tangent activation function (tanh) is proposed here in the input layer, and the sigmoid function in the output layer. They are used frequently in feedforward nets, and are suitable for shallow networks as well as applications of prediction and mapping [38,41].…”

Section: Input Layer ( and )mentioning

confidence: 99%

Estimating Weibull Parameters Using Least Squares and Multilayer Perceptron vs. Bayes Estimation

Aydi¹,

Al-Duais²

2022

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

The Weibull distribution is regarded as among the finest in the family of failure distributions. One of the most commonly used parameters of the Weibull distribution (WD) is the ordinary least squares (OLS) technique, which is useful in reliability and lifetime modeling. In this study, we propose an approach based on the ordinary least squares and the multilayer perceptron (MLP) neural network called the OLSMLP that is based on the resilience of the OLS method. The MLP solves the problem of heteroscedasticity that distorts the estimation of the parameters of the WD due to the presence of outliers, and eases the difficulty of determining weights in case of the weighted least square (WLS). Another method is proposed by incorporating a weight into the general entropy (GE) loss function to estimate the parameters of the WD to obtain a modified loss function (WGE). Furthermore, a Monte Carlo simulation is performed to examine the performance of the proposed OLSMLP method in comparison with approximate Bayesian estimation (BLWGE) by using a weighted GE loss function. The results of the simulation showed that the two proposed methods produced good estimates even for small sample sizes. In addition, the techniques proposed here are typically the preferred options when estimating parameters compared with other available methods, in terms of the mean squared error and requirements related to time.

show abstract

Section: Input Layer ( and )mentioning

confidence: 99%

Estimating Weibull Parameters Using Least Squares and Multilayer Perceptron vs. Bayes Estimation

Aydi¹,

Al-Duais²

2022

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

show abstract

“…Positive elements are preserved in ReLU and all the negative elements are discarded by fixing the corresponding activations to 0. When a negative input is supplied, ReLU function produces a 0, and when a positive input is supplied, it outputs a 1 [33].…”

Section: Activation Functions 411 Rectified Linear Unit Functionmentioning

confidence: 99%

Automatic text summarization of konkani texts using pre-trained word embeddings and deep learning

D’Silva

Sharma

2022

IJECE

View full text Add to dashboard Cite

<span lang="EN-US">Automatic text summarization has gained immense popularity in research. Previously, several methods have been explored for obtaining effective text summarization outcomes. However, most of the work pertains to the most popular languages spoken in the world. Through this paper, we explore the area of extractive automatic text summarization using deep learning approach and apply it to Konkani language, which is a low-resource language as there are limited resources, such as data, tools, speakers and/or experts in Konkani. In the proposed technique, Facebook’s fastText <br /> pre-trained word embeddings are used to get a vector representation for sentences. Thereafter, deep multi-layer perceptron technique is employed, as a supervised binary classification task for auto-generating summaries using the feature vectors. Using pre-trained fastText word embeddings eliminated the requirement of a large training set and reduced training time. The system generated summaries were evaluated against the ‘gold-standard’ human generated summaries with recall-oriented understudy for gisting evaluation (ROUGE) toolkit. The results thus obtained showed that performance of the proposed system matched closely to the performance of the human annotators in generating summaries.</span>

show abstract

“…The only difference is that Sigmoid lies between 0 and 1, whereas Tanh lies between 1 and −1. One of the main advantages of Sigmoid and Tanh activation functions is that the activations (i.e., the values in the nodes of the network, not the gradients) may not explode during the learning process, since their output range is bounded (Feng and Lu, 2019;Szandała, 2020). However, it should be pointed out that each activation function has its own strengths and limitations and its performance may be different based on the network complexity and data structure (Nwankpa et al, 2018;Feng and Lu, 2019).…”

Section: Ann Prediction Performancementioning

confidence: 99%

Application of Machine Learning Models to Predict Maximum Event Water Fractions in Streamflow

2021

View full text Add to dashboard Cite

Estimating the maximum event water fraction, at which the event water contribution to streamflow reaches its peak value during a precipitation event, gives insight into runoff generation mechanisms and hydrological response characteristics of a catchment. Stable isotopes of water are ideal tracers for accurate estimation of maximum event water fractions using isotopic hydrograph separation techniques. However, sampling and measuring of stable isotopes of water is laborious, cost intensive, and often not conceivable under difficult spatiotemporal conditions. Therefore, there is a need for a proper predictive model to predict maximum event water fractions even at times when no direct sampling and measurements of stable isotopes of water are available. The behavior of maximum event water fraction at the event scale is highly dynamic and its relationships with the catchment drivers are complex and non-linear. In last two decades, machine learning algorithms have become increasingly popular in the various branches of hydrology due to their ability to represent complex and non-linear systems without any a priori assumption about the structure of the data and knowledge about the underlying physical processes. Despite advantages of machine learning, its potential in the field of isotope hydrology has rarely been investigated. Present study investigates the applicability of Artificial Neural Network (ANN) and Support Vector Machine (SVM) algorithms to predict maximum event water fractions in streamflow using precipitation, soil moisture, and air temperature as a set of explanatory input features that are more straightforward and less expensive to measure compared to stable isotopes of water, in the Schwingbach Environmental Observatory (SEO), Germany. The influence of hyperparameter configurations on the model performance and the comparison of prediction performance between optimized ANN and optimized SVM are further investigated in this study. The performances of the models are evaluated using mean absolute error (MAE), root mean squared error (RMSE), coefficient of determination (R2), and Nash-Sutcliffe Efficiency (NSE). For the ANN, the results showed that an appropriate number of hidden nodes and a proper activation function enhanced the model performance, whereas changes of the learning rate did not have a major impact on the model performance. For the SVM, Polynomial kernel achieved the best performance, whereas Linear yielded the weakest performance among the kernel functions. The result showed that maximum event water fraction could be successfully predicted using only precipitation, soil moisture, and air temperature. The optimized ANN showed a satisfactory prediction performance with MAE of 10.27%, RMSE of 12.91%, R2 of 0.70, and NSE of 0.63. The optimized SVM was superior to that of ANN with MAE of 7.89%, RMSE of 9.43%, R2 of 0.83, and NSE of 0.78. SVM could better capture the dynamics of maximum event water fractions across the events and the predictions were generally closer to the corresponding observed values. ANN tended to underestimate the events with high maximum event water fractions and to overestimate the events with low maximum event water fractions. Machine learning can prove to be a promising approach to predict variables that are not always possible to be estimated due to the lack of routine measurements.

show abstract

Review and Comparison of Commonly Used Activation Functions for Deep Neural Networks

Cited by 240 publications

References 28 publications

Estimating Weibull Parameters Using Least Squares and Multilayer Perceptron vs. Bayes Estimation

Estimating Weibull Parameters Using Least Squares and Multilayer Perceptron vs. Bayes Estimation

Automatic text summarization of konkani texts using pre-trained word embeddings and deep learning

Application of Machine Learning Models to Predict Maximum Event Water Fractions in Streamflow

Contact Info

Product

Resources

About