2022
DOI: 10.1007/978-3-031-19812-0_30
|View full text |Cite
|
Sign up to set email alerts
|

Inverted Pyramid Multi-task Transformer for Dense Scene Understanding

Abstract: Multi-task scene understanding aims to design models that can simultaneously predict several scene understanding tasks with one versatile model. Previous studies typically process multi-task features in a more local way, and thus cannot effectively learn spatially global and cross-task interactions, which hampers the models' ability to fully leverage the consistency of various tasks in multitask learning. To tackle this problem, we propose an Inverted Pyramid multi-task Transformer, capable of modeling cross-t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2025
2025

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 42 publications
(12 citation statements)
references
References 54 publications
0
12
0
Order By: Relevance
“…Our model outperforms all the multitask baselines, i.e. ST-MTL [36], InvPT [70], Taskprompter [71], and MulT [3], respectively. For instance, our model correctly segments and predicts the surface normal of the elements within the yellow-circled region, unlike the baseline.…”
Section: Additional Qualitative Resultsmentioning
confidence: 88%
See 2 more Smart Citations
“…Our model outperforms all the multitask baselines, i.e. ST-MTL [36], InvPT [70], Taskprompter [71], and MulT [3], respectively. For instance, our model correctly segments and predicts the surface normal of the elements within the yellow-circled region, unlike the baseline.…”
Section: Additional Qualitative Resultsmentioning
confidence: 88%
“…InvPT [70]: performs simultaneous modeling of spatial positions and multiple dense prediction tasks in a unified transformer framework.…”
Section: Transformer-based Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Other Efforts to Unify Dense Prediction Tasks Realizing segmentation, depth and surface normal are all pixelwise mapping problem, previous works [10,52,67,72,96] have deployed them to the same framework. Some works [11,12,103,104,106] improve performance by exploring the relations among dense prediction tasks. UViM [48] and Painter [96] unify segmentation and depth estimation, but neither of them is based on discretizing the continuous output space, and neither considers surface normal estimation.…”
Section: Related Workmentioning
confidence: 99%
“…Neural Networks (CNN). Several techniques have made significant advancements in extracting deep information, whether supervised [7], [8], [9], [10], semi-supervised [11], [12], [13] or unsupervised [14], [15], [16]. The first impressive single image depth estimation based on CNN, Eigen et al [7] estimated depth information using two independent deep neural networks.…”
Section: Various Approaches Have Accomplished Remarkable Improvements...mentioning
confidence: 99%