2023
DOI: 10.1186/s43067-023-00123-z
|View full text |Cite
|
Sign up to set email alerts
|

Object detection using convolutional neural networks and transformer-based models: a review

Shrishti Shah,
Jitendra Tembhurne

Abstract: Transformer models are evolving rapidly in standard natural language processing tasks; however, their application is drastically proliferating in computer vision (CV) as well. Transformers are either replacing convolution networks or being used in conjunction with them. This paper aims to differentiate the design of convolutional neural networks (CNNs) built models and models based on transformer, particularly in the domain of object detection. CNNs are designed to capture local spatial patterns through convol… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 61 publications
0
5
0
Order By: Relevance
“…In order to comprehensively evaluate the performance of the model on the rice weed detection task, we conducted comparative experiments on the rice weed dataset, comparing the RMS-DETR model with other classic DETR variants, including Deformable DETR [ 14 ], Anchor DETR [ 40 ], and DAB-DETR [ 41 ]. The experimental results are presented in Table 7 and Fig.…”
Section: Resultsmentioning
confidence: 99%
See 3 more Smart Citations
“…In order to comprehensively evaluate the performance of the model on the rice weed detection task, we conducted comparative experiments on the rice weed dataset, comparing the RMS-DETR model with other classic DETR variants, including Deformable DETR [ 14 ], Anchor DETR [ 40 ], and DAB-DETR [ 41 ]. The experimental results are presented in Table 7 and Fig.…”
Section: Resultsmentioning
confidence: 99%
“…In recent years, extensive research has been conducted on CNN-based object detectors. These detectors are primarily categorized into two-stage networks and one-stage networks, with the representative models being the R-CNN series and the YOLO series [ 13 , 14 ]. Zhang et al [ 15 ] embedded the CBAM attention mechanism after the pooling layers in the latter part of VGG19, forming the VGG19-CBAM structure as the optimal backbone feature extraction network for the Faster R-CNN model.…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…In most machine learning and convolutional neural network (CNN) based detectors [7,8], the success of detection relies on collecting and labeling a large amount of training and testing data in a variety of road environments. If self-driving vehicles are limited to traveling within a specific geographic region, and their training data collection is limited in that region, such detectors may provide good detection accuracy when operated in the same region or in similar environments.…”
Section: Introductionmentioning
confidence: 99%