2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition 2018
DOI: 10.1109/cvpr.2018.00635
|View full text |Cite
|
Sign up to set email alerts
|

MX-LSTM: Mixing Tracklets and Vislets to Jointly Forecast Trajectories and Head Poses

Abstract: Recent approaches on trajectory forecasting use tracklets to predict the future positions of pedestrians exploiting Long Short Term Memory (LSTM) architectures. This paper shows that adding vislets, that is, short sequences of head pose estimations, allows to increase significantly the trajectory forecasting performance. We then propose to use vislets in a novel framework called MX-LSTM, capturing the interplay between tracklets and vislets thanks to a joint unconstrained optimization of full covariance matric… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
126
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 126 publications
(126 citation statements)
references
References 59 publications
0
126
0
Order By: Relevance
“…To predict a future trajectory of pedestrians from first-person videos, temporal changes of orientation and body pose are encoded as one of the features in [45]. In parallel, [13] uses head pose as a proxy to build a better forecasting model. Both methods find that gaze, inferred by the body or head orientation, and Figure 2: Given a sequence of images, the GRE visually analyzes spatial behavior of road users and their temporal interactions with respect to environments.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…To predict a future trajectory of pedestrians from first-person videos, temporal changes of orientation and body pose are encoded as one of the features in [45]. In parallel, [13] uses head pose as a proxy to build a better forecasting model. Both methods find that gaze, inferred by the body or head orientation, and Figure 2: Given a sequence of images, the GRE visually analyzes spatial behavior of road users and their temporal interactions with respect to environments.…”
Section: Related Workmentioning
confidence: 99%
“…Such technologies require advanced decision making and motion planning systems that rely on estimates of the future position of road users in order to realize safe and effective mitigation and navigation strategies. Related research [46,1,36,23,37,12,13,43,45,32,33,47] has attempted to predict future trajectories by focusing on social conventions, environmental factors, or pose and motion constraints. They have shown to be more effective when the prediction model learns to extract these features by considering human-human (i.e., between road agents) or human-space (i.e., between a road agent and environment) interactions.…”
Section: Introductionmentioning
confidence: 99%
“…Alternatively, some approaches use temporal convolutional networks for encoding sequences of past locations [12], [13], allowing for faster run-times. In addition to location co-ordinates, some approaches also incorporate auxiliary information such as the head pose of pedestrians [9], [14] while encoding past motion. Many approaches jointly model the past motion of multiple agents in the scene to capture interaction between agents [5], [15], [12], [10], [7], [11].…”
Section: Related Studiesmentioning
confidence: 99%
“…Supervised Learning techniques have been applied to predict the movements of agents in a temporal horizon. For example, sequence models that use Long Short Term Memory (LSTM) recurrent neural networks like Social LSTM [1] and other [12,13] are capable to encode the Human-Robot interactions and Human-Human interactions to improve the predictions. Other techniques are based in generative models like Social-GAN [11] or SoPhie [21] that use pooling modules and attention modules.…”
Section: Navigation Based On Deep Reinforcement Learningmentioning
confidence: 99%