2022
DOI: 10.1016/j.patcog.2021.108468
|View full text |Cite
|
Sign up to set email alerts
|

CANet: Co-attention network for RGB-D semantic segmentation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 82 publications
(16 citation statements)
references
References 6 publications
0
16
0
Order By: Relevance
“…While previous works reach high performance on standard RGBbased semantic segmentation benchmarks, in challenging real-world conditions, it is desirable to involve multi-modality sensing for a reliable and comprehensive scene understanding [13]. RGB-Depth [73], [74], [75], [76], [77] and RGB-Thermal [19], [78], [79], [80], [81] semantic segmentation are broadly investigated. Polarimetric optical cues [11], [82], [83] and event-driven priors [12], [84], [85] are often intertwined for robust perception under adverse conditions.…”
Section: Multi-modal Semantic Segmentationmentioning
confidence: 99%
“…While previous works reach high performance on standard RGBbased semantic segmentation benchmarks, in challenging real-world conditions, it is desirable to involve multi-modality sensing for a reliable and comprehensive scene understanding [13]. RGB-Depth [73], [74], [75], [76], [77] and RGB-Thermal [19], [78], [79], [80], [81] semantic segmentation are broadly investigated. Polarimetric optical cues [11], [82], [83] and event-driven priors [12], [84], [85] are often intertwined for robust perception under adverse conditions.…”
Section: Multi-modal Semantic Segmentationmentioning
confidence: 99%
“…For the specific operation of feature fusion, some simple parameter-free operations such as concat, weighted sum or bilinear pooling [23] can build stable baseline performance. Additionally, several attention-based methods make feature fusion more flexible and learnable [1,11,14,22,49]. Moreover, alignment-based fusion methods focus on feature alignment, and they typically use flow field or deformable convolution [10,50] to align features of different levels in the spatial dimension [16,17,21,34,38].…”
Section: Feature Fusionmentioning
confidence: 99%
“…CANet [58] tackles the RGB-D semantic segmentation task by proposing a co-attention network to construct a proper interaction between RGB and depth features. CANet mainly proposes a co-attention fusion module that utilizes the position and channel co-attention to adaptively fuse RGB and depth features in spatial and channel dimensions.…”
Section: Related Work A: Multi-scalementioning
confidence: 99%
“…RAN[15] proposes a Residual Attention Network, which is built by stacking complex Attention Modules to generate attention-aware features that are changed adaptively in each layer. The Attention Module used in[15] is complex because of using bottom-up and top-down feedforward structure and due to inserting trunk-and-mask attention mechanism based on hourglass modules[32] between the intermediate stages.CANet[58] tackles the RGB-D semantic segmentation task by proposing a co-attention network to construct a proper interaction between RGB and depth features. CANet mainly proposes a co-attention fusion module that utilizes the position and channel co-attention to adaptively fuse RGB and depth features in spatial and channel dimensions.…”
mentioning
confidence: 99%