2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00700
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
218
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 374 publications
(220 citation statements)
references
References 32 publications
2
218
0
Order By: Relevance
“…Transfuser [35] and IA [42]. LBC is the state of the art on the NoCrash benchmark, and Transfuser is a very recent method utilizing sensor fusion.…”
Section: Methodsmentioning
confidence: 99%
“…Transfuser [35] and IA [42]. LBC is the state of the art on the NoCrash benchmark, and Transfuser is a very recent method utilizing sensor fusion.…”
Section: Methodsmentioning
confidence: 99%
“…Various multi-modal fusion methods have been proposed for extracting inter-modal relationship and fusing representations [23][24][25][26]. In particular, the use of multi-head self-attention (MHSA) based frameworks such as Transformer has recently been reported to be effective for linking different modalities [26,27]. MHSA is defined as the case where…”
Section: Related Studiesmentioning
confidence: 99%
“…Here, W Q i , W K i , and W V i are projection parameters, and d k are dimension of K. By applying different projections on the input for each head, it is possible to model the relationship between the input sequences from multiple perspectives. Although MHSA was originally proposed for natural language processing, now it has been reported that by using joint sequences from two modalities as input (e.g., video and word or image and light detection and ranging (LiDAR) image), it is possible to extract the relationship between them [26,27].…”
Section: Related Studiesmentioning
confidence: 99%
“…[45] is an overview of learningbased SLAM and suggests that artificial intelligence (AI) and deep learning models can aid performance in cases with imperfect sensor measurement, environmental dynamics, or noise. Another application of deep learning in SLAM has been the improvement of sensor fusion in SLAM and end-to-end autonomous driving [46] [47] [48].…”
Section: Deep Learning Slammentioning
confidence: 99%