Anticipating future situations from streaming sensor data is a key perception challenge for mobile robotics and automated vehicles. We address the problem of predicting the path of objects with multiple dynamic modes. The dynamics of such targets can be described by a Switching Linear Dynamical System (SLDS). However, predictions from this probabilistic model cannot anticipate when a change in dynamic mode will occur. We propose to extract various types of cues with computer vision to provide context on the target's behavior, and incorporate these in a Dynamic Bayesian Network (DBN). The DBN extends the SLDS by conditioning the mode transition probabilities on additional context states. We describe efficient online inference in this DBN for probabilistic path prediction, accounting for uncertainty in both measurements and target behavior. Our approach is illustrated on two scenarios in the Intelligent Vehicles domain concerning pedestrians and cyclists, so-called Vulnerable Road Users (VRUs). Here, context cues include the static environment of the VRU, its dynamic environment, and its observed actions. Experiments using stereo vision data from a moving vehicle demonstrate that the proposed approach results in more accurate path prediction than SLDS at the relevant short time horizon (1 s). It slightly outperforms a computationally more demanding state-of-the-art method.
Abstract-We learn motion models for cyclist path prediction on real-world tracks obtained from a moving vehicle, and propose to exploit the local road topology to obtain better predictive distributions. The tracks are extracted from the Tsinghua-Daimler Cyclist Benchmark for cyclist detection, and corrected for vehicle egomotion. Tracks are then spatially aligned to local curves and crossings in the road. We study a standard approach for path prediction in the literature based on Kalman Filters, as well as a mixture of specialized filters related to specific road orientations at junctions. Our experiments demonstrate an improved prediction accuracy (up to 20% on sharp turns) of mixing specialized motion models for canonical directions, and prior knowledge on the road topology. The new track data complements the existing video, disparity and annotation data of the original benchmark, and will be made publicly available.
Next-generation automotive radars provide elevation data in addition to range-, azimuth-and Doppler velocity. In this experimental study, we apply a state-of-the-art object detector (PointPillars), previously used for LiDAR 3D data, to such 3+1D radar data (where 1D refers to Doppler). In ablation studies, we first explore the benefits of the additional elevation information, together with that of Doppler, radar cross section and temporal accumulation, in the context of multi-class road user detection. We subsequently compare object detection performance on the radar and LiDAR point clouds, object class-wise and as a function of distance. To facilitate our experimental study, we present the novel View-of-Delft (VoD) automotive dataset. It contains 8693 frames of synchronized and calibrated 64-layer LiDAR-, (stereo) camera-, and 3+1D radar-data acquired in complex, urban traffic. It consists of 123106 3D bounding box annotations of both moving and static objects, including 26587 pedestrian, 10800 cyclist and 26949 car labels. Our results show that object detection on 64-layer LiDAR data still outperforms that on 3+1D radar data, but the addition of elevation information and integration of successive radar scans helps close the gap. The VoD dataset is made freely available for scientific benchmarking at https://intelligent-vehicles.org/datasets/view-ofdelft/.
This paper presents our research platform SafeVRU for the interaction of self-driving vehicles with Vulnerable Road Users (VRUs, i.e., pedestrians and cyclists). The paper details the design (implemented with a modular structure within ROS) of the full stack of vehicle localization, environment perception, motion planning, and control, with emphasis on the environment perception and planning modules. The environment perception detects the VRUs using a stereo camera and predicts their paths with Dynamic Bayesian Networks (DBNs), which can account for switching dynamics. The motion planner is based on model predictive contouring control (MPCC) and takes into account vehicle dynamics, control objectives (e.g., desired speed), and perceived environment (i.e., the predicted VRU paths with behavioral uncertainties) over a certain time horizon. We present simulation and real-world results to illustrate the ability of our vehicle to plan and execute collision-free trajectories in the presence of VRUs. I. INTRODUCTION Every year between 20 and 50 million people are involved in road accidents, mostly caused by human errors [1]. According to [1], approximately 1.3 million people lost their life in these accidents. Half of the victims are vulnerable road users (VRUs), such as pedestrians and cyclists. Self-driving vehicles can help reduce these fatalities [2]. Active safety features, such as autonomous emergency braking (AEB), are increasingly found on-board vehicles on the market to improve VRUs' safety (see [3] for a recent overview). In addition, some vehicles already automate steering functionality (e.g., [4], [5]), but still require the driver to initiate the maneuver. Major challenges must be addressed to ensure safety and performance while driving in complex urban environments [6], where VRUs are present. The self-driving vehicle should be aware of the presence of the VRUs and be able to infer their intentions to plan its path accordingly to avoid collisions. In this respect, motion planning methods are required to provide safe (collision-free) and systemcompliant performance in complex environments with static and moving obstacles (refer to [7], [8] for an overview). In real-world applications, the information on the pose (i.e., position and orientation) of other traffic participants comes from a perception module. The perception module should provide to the planner information not only concerning the current position of the other road users, but also † The authors equally contributed to the paper.
This paper proposes a Recurrent Neural Network (RNN) for cyclist path prediction to learn the effect of contextual cues on the behavior directly in an end-to-end approach, removing the need for any annotations. The proposed RNN incorporates three distinct contextual cues: one related to actions of the cyclist, one related to the location of the cyclist on the road, and one related to the interaction between the cyclist and the egovehicle. The RNN predicts a Gaussian distribution over the future position of the cyclist one second into the future with a higher accuracy, compared to a current state-of-the-art model that is based on dynamic mode annotations, where our model attains an average prediction error of 33 cm one second into the future.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.