2021
DOI: 10.1007/978-3-030-89941-7_13
|View full text |Cite
|
Sign up to set email alerts
|

Hyperparameter Tuning over an Attention Model for Image Captioning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 12 publications
0
3
0
Order By: Relevance
“…These activation functions, each with distinct characteristics, play a crucial role in introducing non-linearity into the neural network, enabling it to learn complex patterns in the data [33,34]. Possible optimizers are Adam, SGD, RMSprop, and AdamW [35]. The batch size options are set at 64, 128, or 256.…”
Section: A Multi-objective Optimizationmentioning
confidence: 99%
“…These activation functions, each with distinct characteristics, play a crucial role in introducing non-linearity into the neural network, enabling it to learn complex patterns in the data [33,34]. Possible optimizers are Adam, SGD, RMSprop, and AdamW [35]. The batch size options are set at 64, 128, or 256.…”
Section: A Multi-objective Optimizationmentioning
confidence: 99%
“…In response to the uncertainties raised by the previously described experimental scenarios, the combination of crossentropy loss and Adam optimizer was highlighted as the best hyperparameter configuration according to the Top-5 Accuracy, BLEU-4, and loss value metrics. By reusing this configuration for the following experiment, different decisions can be made depending on the final purpose of the researcher [10]. If the architecture with the best metrics concerning response quality is required, the convolutional models ResNet-152 and ResNeXt-101 provided the best results in the metrics used in the previous experimentation.…”
Section: Introductionmentioning
confidence: 99%
“…Currently, computer vision (CV) tasks are useful for solving problems related to object detection, classification, object counting, visual surveillance, etc., taking advantage of video resources from public surveillance cameras located in many public areas (i.e., shopping malls, supermarkets, airports, train stations, stadiums, etc.) [9][10][11][12]. The problem of the correct/incorrect wearing of face masks implies two CV tasks: (1) object detection and (2) object classification.…”
Section: Introductionmentioning
confidence: 99%