2020
DOI: 10.3390/a13120331
|View full text |Cite
|
Sign up to set email alerts
|

Predicting Intentions of Pedestrians from 2D Skeletal Pose Sequences with a Representation-Focused Multi-Branch Deep Learning Network

Abstract: Understanding the behaviors and intentions of humans is still one of the main challenges for vehicle autonomy. More specifically, inferring the intentions and actions of vulnerable actors, namely pedestrians, in complex situations such as urban traffic scenes remains a difficult task and a blocking point towards more automated vehicles. Answering the question “Is the pedestrian going to cross?” is a good starting point in order to advance in the quest to the fifth level of autonomous driving. In this paper, we… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 26 publications
(13 citation statements)
references
References 87 publications
0
13
0
Order By: Relevance
“…In [ 11 ], pedestrian 2D pose keypoints sequence from a specific time interval (≈0.5 s) is used as an input to classify, with a random forest model, the pedestrian crossing action being performed. In [ 12 ], precomputed features extracted from the complete sequence of pose keypoints are used as input for a multi-branch 2D CNN network. In [ 13 ], whole sequences of pedestrian’s keypoints are used in a similar way to the previous one, but generating adjacency matrix representations based on the pose graph, as an approximation to graph learning.…”
Section: Related Workmentioning
confidence: 99%
“…In [ 11 ], pedestrian 2D pose keypoints sequence from a specific time interval (≈0.5 s) is used as an input to classify, with a random forest model, the pedestrian crossing action being performed. In [ 12 ], precomputed features extracted from the complete sequence of pose keypoints are used as input for a multi-branch 2D CNN network. In [ 13 ], whole sequences of pedestrian’s keypoints are used in a similar way to the previous one, but generating adjacency matrix representations based on the pose graph, as an approximation to graph learning.…”
Section: Related Workmentioning
confidence: 99%
“…The Joint Attention in Autonomous Driving (JAAD) dataset [110] was used in the research studies [12], [51], and [71]. This dataset focuses on pedestrian and driver activity at crossings and the factors that affect it.…”
Section: Figure 2challenges For Behavior Prediction Of Traffic Actors Based On Input Representation Rq 3 What Are the Different Datasets mentioning
confidence: 99%
“…Dataset Name Application and Purpose Feature [12], [51], [71] Joint Attention in Autonomous Driving (JAAD) dataset…”
Section: Referencementioning
confidence: 99%
“…This breakthrough has stimulated the skeletal modality interest since it proved to be sufficient to describe and understand the motion of a given action without any background context. This has made pose-based action recognition preferred over other modalities on a huge amount of real-time scenarios for human action recognition such as human-robot interaction [24], [3], medical rehabilitative applications [25], [8] or pedestrian action prediction [12], [11], [13]. Some commonly used learning architectures for pose-based action recognition include 1D/2D convolutional networks [9], [27], recurrent networks [1], [39], a combination of one of the latter with attention mechanisms [21], [16] or Graph-based models [56], [50].…”
Section: A Pose-based Action Recognitionmentioning
confidence: 99%
“…All those methods present a drawback: they become sensitive to noise, background, and illumination conditions by including scene images in their approaches. To overcome these issues, intention prediction only based on 2D body poses sequences has been explored with various available learning architectures such as convolutions [11], recurrent cells [22], [13], graph-based models [4] and proposed to enhance pose-based approaches by creating features based on body structure to capture different aspects of the data [31], [12]. However, the lack of a common evaluation criterion, of normalized modalities inputs, of a common observation frames selection method, and common prediction horizons made the task of comparing each approach's robustness difficult if not impossible to realize.…”
Section: B Pedestrian Action Predictionmentioning
confidence: 99%