Instrumented and autonomous vehicles can generate very high volumes of video data per car per day all of which must be annotated at a high degree of granularity, detail, and accuracy. Manually or automatically annotating videos at this level and volume is not a trivial task. Manual annotation is slow and expensive while automatic annotation algorithms have shown significant improvement over the past few years. This demonstration presents an application of multi-object tracking, path prediction, and semantic segmentation approaches to facilitate the process of multi-object video annotation for enriched tracklet extraction. Currently, these three approaches are used to enhance the annotation task but more can and will be included in the future.
This work presents an analysis of predicting the future path of moving objects from a moving camera on traffic scenes with an LSTM architecture in a single-shot manner. Path prediction allows us to estimate the future locations of an object in a given space and is useful in important applications such as surveillance, abnormal behaviour detection, crowd behaviour analysis, traffic control and currently in driver assistance (ADAS) or collision avoidance systems. Normal approaches use the last tobs positions of an object observed in video frames to predict its future path as a sequence of position values. This can then be treated as a time series. LSTM architectures are known for reaching good performance when dealing with time series. We evaluate path prediction across three types of objects (pedestrians, vehicles and cyclists), four prediction horizons (5, 10, 15 and 20 frames ahead) and two different perspectives (image coordinate and birds-eye view). The approach described in this work reached an Average Displacement Error (ADE) of 0.01m for pedestrians, 0.06m for vehicles and 0.02m for cyclists and an average Final Displacement Error (FDE) of between 0.016m and 0.15m for near-future prediction using an LSTM architecure with relative tracklet positioning.
This work presents an analysis of predicting multiple future paths of moving objects in traffic scenes by leveraging Long Short-Term Memory architectures (LSTMs) and Mixture Density Networks (MDNs) in a single-shot manner. Path prediction allows estimating the future positions of objects. This is useful in important applications such as security monitoring systems, Autonomous Driver Assistance Systems and assistive technologies. Normal approaches use observed positions (tracklets) of objects in video frames to predict their future paths as a sequence of position values. This can be treated as a time series. LSTMs have achieved good performance when dealing with time series. However, LSTMs have the limitation of only predicting a single path per tracklet. Path prediction is not a deterministic task and requires predicting with a level of uncertainty. Predicting multiple paths instead of a single one is therefore a more realistic manner of approaching this task. In this work, predicting a set of future paths with associated uncertainty was archived by combining LSTMs and MDNs. The evaluation was made on the KITTI and the CityFlow datasets on three type of objects, four prediction horizons and two different points of view (image coordinates and birds-eye view).
This work presents an analysis of predicting multiple future paths of moving objects in traffic scenes by leveraging Long Short-Term Memory architectures (LSTMs) and Mixture Density Networks (MDNs) in a single-shot manner. Path prediction allows estimating the future positions of objects. This is useful in important applications such as security monitoring systems, Autonomous Driver Assistance Systems and assistive technologies. Normal approaches use observed positions (tracklets) of objects in video frames to predict their future paths as a sequence of position values. This can be treated as a time series. LSTMs have achieved good performance when dealing with time series. However, LSTMs have the limitation of only predicting a single path per tracklet. Path prediction is not a deterministic task and requires predicting with a level of uncertainty. Predicting multiple paths instead of a single one is therefore a more realistic manner of approaching this task. In this work, predicting a set of future paths with associated uncertainty was archived by combining LSTMs and MDNs. The evaluation was made on the KITTI and the CityFlow datasets on three type of objects, four prediction horizons and two different points of view (image coordinates and birds-eye view).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.