2024
DOI: 10.1109/tnnls.2022.3152990
|View full text |Cite
|
Sign up to set email alerts
|

Region-Object Relation-Aware Dense Captioning via Transformer

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
35
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 75 publications
(35 citation statements)
references
References 35 publications
0
35
0
Order By: Relevance
“…Compared to handcrafted features, deep features are more discriminative. With deep features, the computer vision tasks make a great progress in recent years, such as detection [24], [44], [57], segmentation [9], [55], and classification [58], [60]. These methods improve feature representation via better network or architecture design.…”
Section: Feature Representation and Distillationmentioning
confidence: 99%
“…Compared to handcrafted features, deep features are more discriminative. With deep features, the computer vision tasks make a great progress in recent years, such as detection [24], [44], [57], segmentation [9], [55], and classification [58], [60]. These methods improve feature representation via better network or architecture design.…”
Section: Feature Representation and Distillationmentioning
confidence: 99%
“…Wang et al [19] leverage a graph neural network to implicitly model the visual relationship among the objects/regions of interest in an image, which does not need pre-defined relationship classes. Shao et al [20] propose a region-object correlation score unit to measure the importance of each region via transformer-based architecture.…”
Section: End-to-end Dense Video Captioningmentioning
confidence: 99%
“…Shao et al. [20] propose a region‐object correlation score unit to measure the importance of each region via transformer‐based architecture.…”
Section: Related Workmentioning
confidence: 99%
“…[14][15][16][17][18] We have seen that there has been massive progress in vision tasks, whether related to classification 19 work (action classification, image classification, and attribute classification), [20][21][22][23][24] or related to recognition work (object and scene recognition). [25][26][27][28][29] Generating an automatic description of the image is a new work. As we know, most communication between machines and humans depends on natural language understanding and explaining.…”
Section: Image Captioningmentioning
confidence: 99%
“…Mainly image captioning has been tackled via RNNs models, 9‐13 and Long Short‐Term Memory (LSTMs) 14‐18 . We have seen that there has been massive progress in vision tasks, whether related to classification 19 work (action classification, image classification, and attribute classification), 20‐24 or related to recognition work (object and scene recognition) 25‐29 …”
Section: Introductionmentioning
confidence: 99%