For smart mobility, and autonomous vehicles (AV), it is necessary to have a very precise perception of the environment to guarantee reliable decision-making, and to be able to extend the results obtained for the road sector to other areas such as rail. To this end, we introduce a new single-stage monocular real-time 3D object detection convolutional neural network (CNN) based on YOLOv5, dedicated to smart mobility applications for both road and rail environments. To perform the 3D parameter regression, we replace YOLOv5’s anchor boxes with our hybrid anchor boxes. Our method is available in different model sizes such as YOLOv5: small, medium, and large. The new model that we propose is optimized for real-time embedded constraints (lightweight, speed, and accuracy) that takes advantage of the improvement brought by split attention (SA) convolutions called small split attention model (Small-SA). To validate our CNN model, we also introduce a new virtual dataset for both road and rail environments by leveraging the video game Grand Theft Auto V (GTAV). We provide extensive results of our different models on both KITTI and our own GTAV datasets. Through our results, we show that our method is the fastest available 3D object detection with accuracy results close to state-of-the-art methods on the KITTI road dataset. We further demonstrate that the pre-training process on our GTAV virtual dataset improves the accuracy on real datasets such as KITTI, thus allowing our method to obtain an even greater accuracy than state-of-the-art approaches with 16.16% 3D average precision on hard car detection with inference time of 11.1 ms/image on an RTX 3080 GPU.
In this paper, we will introduce our object detection, localization and tracking system for smart mobility applications like traffic road and railway environment. Firstly, an object detection and tracking approach was firstly carried out within two deep learning approaches: You Only Look Once (YOLO) V3 and Single Shot Detector (SSD). A comparison between the two methods allows us to identify their applicability in the traffic environment. Both the performances in road and in railway environments were evaluated. Secondly, object distance estimation based on Monodepth algorithm was developed. This model is trained on stereo images dataset but its inference uses monocular images. As the output data, we have a disparity map that we combine with the output of object detection. To validate our approach, we have tested two models with different backbones including VGG and ResNet used with two datasets : Cityscape and KITTI. As the last step of our approach, we have developed a new method-based SSD to analyse the behavior of pedestrian and vehicle by tracking their movements even in case of no detection on some images of a sequence. We have developed an algorithm based on the coordinates of the output bounding boxes of the SSD algorithm. The objective is to determine if the trajectory of a pedestrian or vehicle can lead to a dangerous situations. The whole of development is tested in real vehicle traffic conditions in Rouen city center, and with videos taken by embedded cameras along the Rouen tramway.
For smart mobility, autonomous vehicles, and advanced driver-assistance systems (ADASs), perception of the environment is an important task in scene analysis and understanding. Better perception of the environment allows for enhanced decision making, which, in turn, enables very high-precision actions. To this end, we introduce in this work a new real-time deep learning approach for 3D multi-object detection for smart mobility not only on roads, but also on railways. To obtain the 3D bounding boxes of the objects, we modified a proven real-time 2D detector, YOLOv3, to predict 3D object localization, object dimensions, and object orientation. Our method has been evaluated on KITTI’s road dataset as well as on our own hybrid virtual road/rail dataset acquired from the video game Grand Theft Auto (GTA) V. The evaluation of our method on these two datasets shows good accuracy, but more importantly that it can be used in real-time conditions, in road and rail traffic environments. Through our experimental results, we also show the importance of the accuracy of prediction of the regions of interest (RoIs) used in the estimation of 3D bounding box parameters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.