2022
DOI: 10.48550/arxiv.2201.06159
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

YOLO -- You only look 10647 times

Abstract: With this work we are explaining the "You Only Look Once" (YOLO) single-stage object detection approach as a parallel classification of 10647 fixed region proposals. We support this view by showing that each of YOLOs output pixel is attentive to a specific sub-region of previous layers, comparable to a local region proposal. This understanding reduces the conceptual gap between YOLO-like singlestage object detection models, RCNN-like two-stage region proposal based models, and ResNet-like image classification … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 6 publications
0
4
0
Order By: Relevance
“…See more details on our implementation in the Age Prediction notebook [5]. We have achieved good results in age prediction using the You Only Look Once (YOLO) architecture [9,10,11]. Although YOLO is an object detection algorithm, we mainly utilized its class prediction capabilities and abandoned the bounding box prediction functionality.…”
Section: Methodsmentioning
confidence: 99%
“…See more details on our implementation in the Age Prediction notebook [5]. We have achieved good results in age prediction using the You Only Look Once (YOLO) architecture [9,10,11]. Although YOLO is an object detection algorithm, we mainly utilized its class prediction capabilities and abandoned the bounding box prediction functionality.…”
Section: Methodsmentioning
confidence: 99%
“…Expressions and emotions from the target image are kept, while the facial identity is swapped. First, it collects a dataset of faces of two people, A and B, using an object detection method [102]. Secondly, it trains two auto-encoders E A , E B to encode and two decoders DC A , DC B to reconstruct the faces of A and B respectively.…”
Section: Face Replacement and Face Transfermentioning
confidence: 99%
“…During training, 200 images were assigned to the test dataset In the training process of the object detection, we need to define 9 anchor frames to accelerate the model training and improve the detection accuracy. The anchor frames are predefined bounding box patterns used by YOLOv5 to delineate regions for object candidates [19]. Each frame (x, y) indicates the width and height of the target size prediction bounding box.…”
Section: Object Detectionmentioning
confidence: 99%
“…These frames were selected by using kmeans to cluster the labels in the training set. Please refer to [19] for more details regarding the selection of anchor frames. In this study, the following anchor frames are used.…”
Section: Object Detectionmentioning
confidence: 99%