Cityscapes 3D: Dataset and Benchmark for 9 DoF Vehicle Detection

Gählert, Nils; Jourdan, N.; Cordts, Marius; Franke, Uwe; Denzler, Joachim

doi:10.48550/arxiv.2006.07864

Cited by 8 publications

(8 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Images are collected from a fixed camera and obtained through consecutive frames. To obtain more pedestrian images in traffic scenes, we filter and crop out pedestrian images from Cityscapes [45][46][47], which is a very large data set system, so there are more types of pedestrians. Our experiment will be conducted on these data sets and compared with the current state-of-the-art methods.…”

Section: Methodsmentioning

confidence: 99%

GRAN: graph recurrent attention network for pedestrian orientation classification

Shan

Liu

et al. 2022

Complex Intell. Syst.

View full text Add to dashboard Cite

In complex traffic scenes, accurate identification of pedestrian orientations can help drivers determine pedestrian trajectories and help reduce traffic accidents. However, there are still many challenges in pedestrian orientation recognition. First, due to the irregular appearance of pedestrians, it is difficult for general Convolutional Neural Networks (CNNs) to extract discriminative features. In addition, more features of body parts help to judge the orientation of pedestrians. For example, head, arms and legs. However, they are usually small and not conducive to feature extraction. Therefore, in this work, we use several discrete values to define the orientation of pedestrians, and propose a Gated Graph Neural Network (GGNN)-based Graph Recurrent Attention Network (GRAN) to classify the orientation of pedestrians. The contributions are as follows: (1) We construct a body parts graph consisting of head, arms and legs on the feature maps output by the CNN backbone. (2) Mining the dependencies between body parts on the graph via the proposed GRAN, and utilizing the encoder–decoder to propagate features among graph nodes. (3) In this process, we propose an adjacency matrix with attention edge weights to dynamically represent graph node relationships, and the edge weights are learned during network training. To evaluate the proposed method, we conduct experiments on three different benchmarks (PDC, PDRD, and Cityscapes) with 8, 3, and 4 orientations, respectively. Note that the orientation labels for PDRD and Cityscapes are annotated by our hand. The proposed method achieves 97%, 91% and 90% classification accuracy on the three data sets, respectively. The results are all higher than current state-of-the-art methods, which demonstrate the effectiveness of the proposed method.

show abstract

Section: Methodsmentioning

confidence: 99%

GRAN: graph recurrent attention network for pedestrian orientation classification

Shan

Liu

et al. 2022

Complex Intell. Syst.

View full text Add to dashboard Cite

show abstract

“…We collected and analyzed commonly used datasets for 3D reconstruction in Tables 1-3. [302] 24 megapixels images, 3D point cloud / / Semantic3D [303] 4 billion points images, 3D point cloud 30 8 classes Paris-Lille-3D [304] 57.79 million images, 3D point cloud 2 50 classes ApolloCar3D [305] 5277 images / 60k Cityscapes 3D [306] 5000 images, 3D point cloud / 8 classes BlendedMVS [307] 17k images, 3D meshes 113 / CSPC-Dataset [308] 68 million points images, 3D point cloud 5 6 classes Toronto-3D [309] 78.3 million points images, 3D point cloud / 8 classes STPLS3D [310] 16 km 2 images, 3D point cloud / / KITTI-360 [311] 300k, 1 billon points images, 3D point cloud / / DiTer [312] / images, 3D point cloud / / SubT-MRS [313] 30 scenes images, 3D point cloud 30 /…”

Section: Datasetsmentioning

confidence: 99%

A Comprehensive Review of Vision-Based 3D Reconstruction Methods

Zhou,

Wu,

Zuo

et al. 2024

Sensors

View full text Add to dashboard Cite

With the rapid development of 3D reconstruction, especially the emergence of algorithms such as NeRF and 3DGS, 3D reconstruction has become a popular research topic in recent years. 3D reconstruction technology provides crucial support for training extensive computer vision models and advancing the development of general artificial intelligence. With the development of deep learning and GPU technology, the demand for high-precision and high-efficiency 3D reconstruction information is increasing, especially in the fields of unmanned systems, human-computer interaction, virtual reality, and medicine. The rapid development of 3D reconstruction is becoming inevitable. This survey categorizes the various methods and technologies used in 3D reconstruction. It explores and classifies them based on three aspects: traditional static, dynamic, and machine learning. Furthermore, it compares and discusses these methods. At the end of the survey, which includes a detailed analysis of the trends and challenges in 3D reconstruction development, we aim to provide a comprehensive introduction for individuals who are currently engaged in or planning to conduct research on 3D reconstruction. Our goal is to help them gain a comprehensive understanding of the relevant knowledge related to 3D reconstruction.

show abstract

“…Cityscape has high-quality pixel-level annotations of 5 k frames and is intended for the evaluation of semantic urban scene understanding tasks. Cityscapes 3D [ 46 ] is a new extension of the original dataset with 3D bounding box annotations for 3D object detection, for example. The Pascal Visual Object Classes (PascalVOC) [ 31 ] challenge is not only a dataset but also an annual competition and workshop.…”

Section: Related Workmentioning

confidence: 99%

Road and Railway Smart Mobility: A High-Definition Ground Truth Hybrid Dataset

Khemmar

Mauri

Dulompont

et al. 2022

Sensors

View full text Add to dashboard Cite

A robust visual understanding of complex urban environments using passive optical sensors is an onerous and essential task for autonomous navigation. The problem is heavily characterized by the quality of the available dataset and the number of instances it includes. Regardless of the benchmark results of perception algorithms, a model would only be reliable and capable of enhanced decision making if the dataset covers the exact domain of the end-use case. For this purpose, in order to improve the level of instances in datasets used for the training and validation of Autonomous Vehicles (AV), Advanced Driver Assistance Systems (ADAS), and autonomous driving, and to reduce the void due to the no-existence of any datasets in the context of railway smart mobility, we introduce our multimodal hybrid dataset called ESRORAD. ESRORAD is comprised of 34 videos, 2.7 k virtual images, and 100 k real images for both road and railway scenes collected in two Normandy towns, Rouen and Le Havre. All the images are annotated with 3D bounding boxes showing at least three different classes of persons, cars, and bicycles. Crucially, our dataset is the first of its kind with uncompromised efforts on being the best in terms of large volume, abundance in annotation, and diversity in scenes. Our escorting study provides an in-depth analysis of the dataset’s characteristics as well as a performance evaluation with various state-of-the-art models trained under other popular datasets, namely, KITTI and NUScenes. Some examples of image annotations and the prediction results of our 3D object detection lightweight algorithms are available in ESRORAD dataset. Finally, the dataset is available online. This repository consists of 52 datasets with their respective annotations performed.

show abstract

Cityscapes 3D: Dataset and Benchmark for 9 DoF Vehicle Detection

Cited by 8 publications

References 23 publications

GRAN: graph recurrent attention network for pedestrian orientation classification

GRAN: graph recurrent attention network for pedestrian orientation classification

A Comprehensive Review of Vision-Based 3D Reconstruction Methods

Road and Railway Smart Mobility: A High-Definition Ground Truth Hybrid Dataset

Contact Info

Product

Resources

About