In this work, the problem of predicting a pedestrian’s intention to
cross the road is addressed using visual data captured from a camera.
The proposed ROS-based modular architecture consists of four modules
starting with the Visual-Perception, Intention Prediction, and the
Planning and Control Modules. The visual perception is further divided
into three sub-modules. First, pedestrian detection is responsible for
detecting the pedestrian and analyzing his state using motion and
looking classifiers. Secondly, the detection of the lane that is
responsible for analyzing the structured environment which helps in the
road state classifiers. The third sub-module aims to extract some
curvilinear localization states that are essential for the vehicle’s
motion planning and control. The intention prediction module is
integrated to capture the pedestrian’s intention to cross the road. In
this module, a comparative study is conducted between three different
data-driven sequential models. Each model is trained on the JAAD dataset
and different extracted features form the visual perception module. It
is observed that the proposed GRU model obtained 86% average f1-score,
and can predict a pedestrian’s intention three seconds before crossing.
In order to control the maneuver of the vehicle, the
Proportional-Integral (PI) controller is implemented for longitudinal
velocity control to brake the vehicle to avoid collision with the
pedestrian, and a Linderoth controller is used to control the lateral
motion of the vehicle. Finally, this work is verified on a 1:4 scaled
real vehicle to ensure the applicability of implementing this work in
real hardware.