2021
DOI: 10.3390/mi12121504
|View full text |Cite
|
Sign up to set email alerts
|

Nonlinear Hyperparameter Optimization of a Neural Network in Image Processing for Micromachines

Abstract: Deep neural networks are widely used in the field of image processing for micromachines, such as in 3D shape detection in microelectronic high-speed dispensing and object detection in microrobots. It is already known that hyperparameters and their interactions impact neural network model performance. Taking advantage of the mathematical correlations between hyperparameters and the corresponding deep learning model to adjust hyperparameters intelligently is the key to obtaining an optimal solution from a deep n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 28 publications
0
6
0
Order By: Relevance
“…Te head_dim is the dimensionality of the multihead attention-based hidden layer of the model, which afects the performance and computational power of the model when the number of heads is a fxed value (generally 8). Additionally, we use dropout in the multihead attention layer to reduce the multihead attention weight to reduce overftting and improve the running speed of the model, and we set the parameter of the dropout function to 0.3-0.5 with reference to the literature [57]. Terefore, to obtain better entity recognition performance from the FLAT model, we conducted exploratory experiments on the optimization algorithm, learning rate (lr), number of model encoding layers, and head_dim with the CLUENER2020 dataset.…”
Section: Flat Optimization With Sgdm and Hyperparametermentioning
confidence: 99%
See 1 more Smart Citation
“…Te head_dim is the dimensionality of the multihead attention-based hidden layer of the model, which afects the performance and computational power of the model when the number of heads is a fxed value (generally 8). Additionally, we use dropout in the multihead attention layer to reduce the multihead attention weight to reduce overftting and improve the running speed of the model, and we set the parameter of the dropout function to 0.3-0.5 with reference to the literature [57]. Terefore, to obtain better entity recognition performance from the FLAT model, we conducted exploratory experiments on the optimization algorithm, learning rate (lr), number of model encoding layers, and head_dim with the CLUENER2020 dataset.…”
Section: Flat Optimization With Sgdm and Hyperparametermentioning
confidence: 99%
“…Unlike the original FLAT model, we fnd the optimal model by experimenting with optimization models such as SGD and Adam and diferent parameters (layers, lr, head_dim); then, the experimentally derived optimal model is compared with the more classic NER method to verify the superiority of the resulting model. Additionally, the parameter of the dropout function is set with reference to the literature [57] as 0.3-0.5, and the multihead attention matrix is changed to a sparse matrix by using dropout in the multihead attention layer to improve the running speed of our model. Other parameter settings are derived from the literature [52].…”
Section: Comparison Experiments With the Baseline Modelsmentioning
confidence: 99%
“…It provides an overview of various CNN approaches utilized for image classification, segmentation, and styling. Next, in [19], the mathematical relationship between four hyperparameters, namely learning rate, batch size, dropout rate, and convolution kernel size were investigated in detail. A generalized multi-parameter mathematical correlation approach was derived, showing that the hyperparameters play a vital part in the efficiency of the NN models.…”
Section: Related Workmentioning
confidence: 99%
“…A dense-layer neuron conducts matrix-vector multiplication over the input of each neuron in the previous layer. As indicated in the Equation, the usual formula for matrix-vector multiplication is as follows (7) [ 37 ]: …”
Section: Densenet-169 For Spectrogram Classificationmentioning
confidence: 99%