2022
DOI: 10.48550/arxiv.2208.01582
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries

Abstract: Existing autonomous driving pipelines separate the perception module from the prediction module. The two modules communicate via hand-picked features such as agent boxes and trajectories as interfaces. Due to this separation, the prediction module only receives partial information from the perception module. Even worse, errors from the perception modules can propagate and accumulate, adversely affecting the prediction results. In this work, we propose ViP3D, a visual trajectory prediction pipeline that leverag… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
4
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 44 publications
1
4
0
Order By: Relevance
“…With a lower ADE and FDE, PF-Track has better trajectory quality. Our conclusions agree with previous studies [17,38,62]. More specifically, LSTM is a shallow model, unable to capture meaningful dynamics from noisy tracks; the stronger VectorNet can perform better than the other baselines but it is still worse than forecasting trajectories in an end-to-end framework, as proposed in our method.…”
Section: Length Of Prediction Insupporting
confidence: 89%
See 1 more Smart Citation
“…With a lower ADE and FDE, PF-Track has better trajectory quality. Our conclusions agree with previous studies [17,38,62]. More specifically, LSTM is a shallow model, unable to capture meaningful dynamics from noisy tracks; the stronger VectorNet can perform better than the other baselines but it is still worse than forecasting trajectories in an end-to-end framework, as proposed in our method.…”
Section: Length Of Prediction Insupporting
confidence: 89%
“…In the prediction literature, on the other hand, it is common to assume the availability of ground truth object trajectories and HD maps [3,7,12,66]. A few attempts for a more realistic evaluation have been made [17,22], focusing only on the prediction performance.…”
Section: Introductionmentioning
confidence: 99%
“…Recent methods follow this concept to output planning results for the ego car given sensor inputs [17,18,25,53,63]. Most of them follow a conventional pipeline of perception [21,32,33,57,65], prediction [11,14,36,66], and planning [22,23,54,67]. They usually first perform BEV perception to extract relevant information (e.g., 3D agent boxes, semantic maps, tracklets) and then exploit them to infer future trajectories of agents and the ego vehicle.…”
Section: Related Workmentioning
confidence: 99%
“…The following methods incorporated more data [63] or extracted more intermediate features [17,18,25] to provide more information for the planner, which achieved remarkable performance. Most methods only model object motions and cannot capture the fine-grained structural and semantic information of the surroundings [11,14,24,25,66]. Differently, we propose a world model to predict the evolution of both the surrounding dynamic and static elements.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation