2021
DOI: 10.1145/3464384
|View full text |Cite
|
Sign up to set email alerts
|

Evolution of Activation Functions: An Empirical Investigation

Abstract: The hyper-parameters of a neural network are traditionally designed through a time-consuming process of trial and error that requires substantial expert knowledge. Neural Architecture Search algorithms aim to take the human out of the loop by automatically finding a good set of hyper-parameters for the problem at hand. These algorithms have mostly focused on hyper-parameters such as the architectural configurations of the hidden layers and the connectivity of the hidden neurons, but there has been relatively l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
3
2
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 32 publications
(27 reference statements)
0
4
0
Order By: Relevance
“…(3) The operation is performed independently for each input channel, resulting in an output with the same number of channels as the input. Here, Y i,j,k is the value of the output feature map at position (i, j) and channel k, X The proposed model incorporates Swish activation functions [28], a choice made due to their well-known smoothness characteristics and effectiveness in enhancing overall model performance. Subsequently, a 1x1 convolutional bottleneck further processes the features.…”
Section: B Adaptivedrnet With Ml-attention: Model Architecturementioning
confidence: 99%
“…(3) The operation is performed independently for each input channel, resulting in an output with the same number of channels as the input. Here, Y i,j,k is the value of the output feature map at position (i, j) and channel k, X The proposed model incorporates Swish activation functions [28], a choice made due to their well-known smoothness characteristics and effectiveness in enhancing overall model performance. Subsequently, a 1x1 convolutional bottleneck further processes the features.…”
Section: B Adaptivedrnet With Ml-attention: Model Architecturementioning
confidence: 99%
“…A recent work by [17] focused on evolving AFs for neural networks. Their work differs in several aspects from our novel coevolutionary algorithm:…”
Section: Previous Workmentioning
confidence: 99%
“…e.g. an evolutionary approach was used to evolve the optimal activation function in [35,[100][101][102][103][104][105][106][107][108][109][110][111][112][113][114][115][116] and grid search using artificial data was used in [117]. Another search for the optimal activation functions was presented in [49] where several simple activation functions were found to perform remarkably well.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Another search for the optimal activation functions was presented in [49] where several simple activation functions were found to perform remarkably well. These automatic approaches might be used for evolving the activation functions (e.g., [100,105]) or for selecting the optimal activation function for a given neuron (e.g., [108,118]). While evolved activation function may perform well for a given problem, they also might be very complex -e.g., evolved activation functions in [105].…”
Section: Literature Reviewmentioning
confidence: 99%