2022 International Conference on Robotics and Automation (ICRA) 2022
DOI: 10.1109/icra46639.2022.9812060
|View full text |Cite
|
Sign up to set email alerts
|

Multi-modal Motion Prediction with Transformer-based Neural Network for Autonomous Driving

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
49
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 77 publications
(49 citation statements)
references
References 18 publications
0
49
0
Order By: Relevance
“…Multi-modal fusion is a method of combining data collected from different modalities to achieve more accurate results [29]. It is widely used in various fields from affective computing [30] to autonomous driving [31]. Recent studies have demonstrated that by using a combination of visual, vocal, and textual data, it is possible to more accurately identify psychological patterns from multiple perspectives [30], [32].…”
Section: B Multi-modal Fusionmentioning
confidence: 99%
“…Multi-modal fusion is a method of combining data collected from different modalities to achieve more accurate results [29]. It is widely used in various fields from affective computing [30] to autonomous driving [31]. Recent studies have demonstrated that by using a combination of visual, vocal, and textual data, it is possible to more accurately identify psychological patterns from multiple perspectives [30], [32].…”
Section: B Multi-modal Fusionmentioning
confidence: 99%
“…Motion prediction is a thread of research that aims to predict long-term future motion trajectories of traffic participants based on their historical dynamic states and optionally the map information. With the introduction of Transformers or GNNs, recent motion prediction networks have gained the ability to effectively handle a heterogeneous mix of traffic entities, e.g., road polylines, traffic light state, and a dynamic set of agents, and achieved unprecedented prediction accuracy [14], [15], [26], [27]. However, most of the existing motion prediction models only focus on improving the prediction accuracy (i.e., position error), ignoring the applicability to the downstream planning task.…”
Section: B Motion Predictionmentioning
confidence: 99%
“…Scene Encoder. The scene encoder is based on Transformer networks, following the structure of our previous works [14], [33]. For each surrounding agent i, the input data consists of its historical states x i −T h :0 and local map polylines I i (i.e., a list of waypoints of nearby routes).…”
Section: B Prediction Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…It has been experimentally validated in urban settings involving complex scenarios using the CARLA urban driving simulator. Furthermore, Huang et al [7] introduced a neural prediction framework utilizing the Transformer structure, with a multi-modal attention mechanism for representing social interactions between agents and predicting multiple trajectories for autonomous driving. In our approach, we do not train a transformer-based model from scratch, as this would be too expensive to do for each application.…”
Section: A Multimodal Transformersmentioning
confidence: 99%