2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.00683
|View full text |Cite
|
Sign up to set email alerts
|

Instances as Queries

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
117
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 231 publications
(117 citation statements)
references
References 29 publications
0
117
0
Order By: Relevance
“…It started with DETR [18], using a transformer encoder-decoder and queries to detect a fixed number of objects. QueryInst [19] further adapted DETR to instance segmentation, by adding dynamic mask heads. Also based on DETR, MaskFormer [20] first segments the input image into a fixed number of masks and then classifies them.…”
Section: A Instance Segmentation Architecturesmentioning
confidence: 99%
“…It started with DETR [18], using a transformer encoder-decoder and queries to detect a fixed number of objects. QueryInst [19] further adapted DETR to instance segmentation, by adding dynamic mask heads. Also based on DETR, MaskFormer [20] first segments the input image into a fixed number of masks and then classifies them.…”
Section: A Instance Segmentation Architecturesmentioning
confidence: 99%
“…Frame-level VIS: Most video instance segmentation methods work at the frame-level fashion, a.k.a. trackingby-segmentation [6,10,11,14,16,19,24,26,29]. This paradigm produces instance segmentation frame-by-frame and achieves tracking by linking the current instance mask to the history tracklet.…”
Section: Related Workmentioning
confidence: 99%
“…Existing methods typically solve the VIS problem at either frame-level or clip-level. The frame-level methods [10,14,16,24,26,29] follow a tracking-by-segmentation paradigm, which first performs image instance segmentation and then links the current masks with history tracklets via data association as shown in Fig. 1 (a).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The advent of automatic feature engineering fuels deep convolution neural networks (CNNs) to reach the remarkable success in a plethora of computer vision tasks, such as image classification [9,10,30,35], object detection [6,17,20], and semantic segmentation [8,34]. In the path of pursuing better performance than that of early prototypes such as VGG [21] and ResNet [9], current deep learning models [11,16,30] generally are embodied with billions of * Correspondence to: Shan You <youshan@sensetime.com>.…”
Section: Introductionmentioning
confidence: 99%