2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022
DOI: 10.1109/cvprw56347.2022.00299
|View full text |Cite
|
Sign up to set email alerts
|

Integrating Pose and Mask Predictions for Multi-person in Videos

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…The fact that videos can be represented by a set of vectors represented by object centers rather than pixel-level information as demonstrated in (Heo et al 2022(Heo et al , 2023, which encourages us to directly align spatial points on the embedding space to map the motion state of polyps. Therefore, we directly exploit e 2 to establish temporal consistency.…”
Section: Cross-wise Scale Alignmentmentioning
confidence: 99%
See 2 more Smart Citations
“…The fact that videos can be represented by a set of vectors represented by object centers rather than pixel-level information as demonstrated in (Heo et al 2022(Heo et al , 2023, which encourages us to directly align spatial points on the embedding space to map the motion state of polyps. Therefore, we directly exploit e 2 to establish temporal consistency.…”
Section: Cross-wise Scale Alignmentmentioning
confidence: 99%
“…As shown in Figure 2, taking e 2 as input, we first model dynamic information as center-perceived polyp motion information. We also use the Transformer (Vaswani et al 2017) layers like (Heo et al 2022(Heo et al , 2023 to center polyp motion information. In addition, we introduce a pixel-decoder (Cheng et al 2022) to generate learnable per-position embedding bias for the embedding space to cope with the dramatic variations in VPS.…”
Section: Cross-wise Scale Alignmentmentioning
confidence: 99%
See 1 more Smart Citation
“…VITA [6] first performs instance predictions in each frame using object queries, and second employs object decoder to associate instance predictions across frames. GenVIS [30] stores object queries in previous clips to guide feature learning in current clip. Moreover, GenVIS can perform video instance segmentation in online mode.…”
Section: A Video Instance Segmentationmentioning
confidence: 99%