2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) 2021
DOI: 10.1109/iccwamtip53232.2021.9674096
|View full text |Cite
|
Sign up to set email alerts
|

Spatial-Temporal Attentive Motion Planning Network for Autonomous Vehicles

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 10 publications
0
10
0
Order By: Relevance
“…Although these methods dealt with learning attentive features, which are mostly applied to classification and segmentation tasks, they do not take into account the simultaneous acquisition of spatiotemporal attention for sequential decision-making problems. Recently, STAMPNet [24] applied the squeeze-and-excitation (SE) module [49] in their feature extractor and 3D-ResNet [53] to learn attended intermediate features of the video and trajectory history for trajectory planning. Following this work, we introduced the SE module into our 3DCNN feature extractor, but instead of using a single backbone, we use a Siamese backbone to simultaneously learn intermediate spatiotemporal features that are invariant across (front and top) driving views.…”
Section: Attention-based Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…Although these methods dealt with learning attentive features, which are mostly applied to classification and segmentation tasks, they do not take into account the simultaneous acquisition of spatiotemporal attention for sequential decision-making problems. Recently, STAMPNet [24] applied the squeeze-and-excitation (SE) module [49] in their feature extractor and 3D-ResNet [53] to learn attended intermediate features of the video and trajectory history for trajectory planning. Following this work, we introduced the SE module into our 3DCNN feature extractor, but instead of using a single backbone, we use a Siamese backbone to simultaneously learn intermediate spatiotemporal features that are invariant across (front and top) driving views.…”
Section: Attention-based Methodsmentioning
confidence: 99%
“…This section describes the general overview of the proposed ViSTAMPCNet for autonomous vehicles, shown in Figure 1. In general, the ViSTAMPCNet is based on imitation learning using CNN-LSTM models [16,22,24], which involves a mapping from expert observations to view-invariant spatiotemporal representations. Then, the view-invariant spatiotemporal representations are used for driving decision making, i.e., driving control command and future trajectory generation.…”
Section: Overviewmentioning
confidence: 99%
See 3 more Smart Citations