2022
DOI: 10.48550/arxiv.2207.02696
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

Abstract: YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. YOLOv7-E6 object detector (56 FPS V100, 55.9% AP) outperforms both transformer-based detector SWIN-L Cascade-Mask R-CNN (9.2 FPS A100, 53.9% AP) by 509% in speed and 2% in accuracy, and convolutionalbased detector ConvNeXt-XL Cascade-Mask R-CNN (8.6 FPS A100, 55.2% AP) by 551% in speed and 0.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
1,291
0
19

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 1,028 publications
(1,311 citation statements)
references
References 69 publications
1
1,291
0
19
Order By: Relevance
“…We found that an F1 ∼ 0.7 can be achieved at precision 70% even with an extremely limited training set, establishing This work demonstrates that modern region-based object detection architectures -i.e., architectures that identify and then classify regions of interest as opposed to sliding window, classification-only architectures that classify every sub-region of an image -produce detection accuracies for LEs that allow for the searching of extremely large upcoming data sets for these objects, a task which may very well be unfeasible with slidingwindow classification models. And while it is true that these region-based models are rapidly evolving (YOLO itself is now at version 7, Wang et al 2022, and other more recent models include Chen et al 2019;Wu et al 2019 for example), the core conceptual foundations are common to all such models. That is, the models pre-sented in this paper have demonstrated that regionbased models are applicable to astronomical data (low SNR, very high spatial resolution, etc.)…”
Section: Discussionmentioning
confidence: 99%
“…We found that an F1 ∼ 0.7 can be achieved at precision 70% even with an extremely limited training set, establishing This work demonstrates that modern region-based object detection architectures -i.e., architectures that identify and then classify regions of interest as opposed to sliding window, classification-only architectures that classify every sub-region of an image -produce detection accuracies for LEs that allow for the searching of extremely large upcoming data sets for these objects, a task which may very well be unfeasible with slidingwindow classification models. And while it is true that these region-based models are rapidly evolving (YOLO itself is now at version 7, Wang et al 2022, and other more recent models include Chen et al 2019;Wu et al 2019 for example), the core conceptual foundations are common to all such models. That is, the models pre-sented in this paper have demonstrated that regionbased models are applicable to astronomical data (low SNR, very high spatial resolution, etc.)…”
Section: Discussionmentioning
confidence: 99%
“…Among them, YOLOv5n is the latest lightweight algorithm of YOLOv5, and YOLOv3 is a relatively mature large-scale one-stage detection algorithm. YOLOv7 [ 28 ] is the latest algorithm of YOLO family at present, which has the strongest comprehensive performance in full-scale detection, and YOLOv7-tiny is a lightweight version of YOLOv7, which has similar parameter quantities and calculation quantities with SF-YOLOv5. ResNeXt-CSP is a new detector combined with classical algorithms ResNeXt [ 29 ] and CSPNet [ 32 ], which have excellent performance.…”
Section: Methodsmentioning
confidence: 99%
“…have been introduced. In addition, the latest YOLOv7 [ 28 ] (which is still under update) and various improved algorithms based on ResNet [ 29 , 30 ] also have excellent performance.…”
Section: Relevant Workmentioning
confidence: 99%
“…Redmon [ 6 ] et al, proposed the YOLO object detection framework, a neural network framework appropriate for targets detected in real-time and with a detection speed of 45 FPS for the first time. Later versions of YOLO, YOLO 9000, YOLO v3, YOLOv4, and YOLOv7 [ 7 , 8 , 9 , 10 ] included new neural network architectures such as batch normalization [ 11 ], FPN [ 12 ], and SPP [ 13 ] to strike a balance between detection accuracy and speed. To build a genuinely anchorless detector, Tian [ 14 ] et al, suggested Fully Convolutional One-Stage Object Detection (FCOS) based on the RetinaNet [ 15 ] network architecture.…”
Section: Introductionmentioning
confidence: 99%