2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.01708
|View full text |Cite
|
Sign up to set email alerts
|

CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
31
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 83 publications
(31 citation statements)
references
References 25 publications
0
31
0
Order By: Relevance
“…One is the "warm-start" [41], which simply warps the flows of the previous image pairs as the initialization for the next image pairs. Despite its simplicity, "warm-start" improves RAFT series [34,39] by non-trivial margins. The other one is PWC-Fusion [31], which fuses information from previous frames with a GRU-RCN at the bottleneck of U-Net.…”
Section: Related Workmentioning
confidence: 99%
“…One is the "warm-start" [41], which simply warps the flows of the previous image pairs as the initialization for the next image pairs. Despite its simplicity, "warm-start" improves RAFT series [34,39] by non-trivial margins. The other one is PWC-Fusion [31], which fuses information from previous frames with a GRU-RCN at the bottleneck of U-Net.…”
Section: Related Workmentioning
confidence: 99%
“…Such global encoding operation is especially beneficial in the hard cases of large displacement and occlusion. There are also other optical flow methods [43], [44] enhancing image feature encoder with transformers but their performance is not competitive. Transformers for Computer Vision.…”
Section: Related Workmentioning
confidence: 99%
“…RAFT [10] is a per-pixel feature extraction approach that constructs multi-scale 4D correlation volumes for each pixel pair, and updates the flow field iteratively through a recurrent unit. Like FlowNet, RAFT has inspired GMA [20] and CRAFT [11]. GMA addresses occlusions by modeling image self-similarities by using a global motion aggregation module, a transformer-based approach for finding long-range dependencies between pixels in the first image, and a global aggregation of the corresponding motion features.…”
Section: Related Workmentioning
confidence: 99%
“…In late-fusion, the network is trained on each modality separately and then fuses the results from the independent branches [24]. RAFT [10], GMA [20], and CRAFT [11] estimate the relationships between two consecutive frames using RGB images. Inspired by multimodal fusion, some of these works have been improved to compute both scene and optical flow by utilizing additional modalities such as depth, and point clouds.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation