ABSTRACT:Obtaining accurate 3d descriptions in the thermal infrared (TIR) is a quite challenging task due to the low geometric resolutions of TIR cameras and the low number of strong features in TIR images. Combining the radiometric information of the thermal infrared with 3d data from another sensor is able to overcome most of the limitations in the 3d geometric accuracy. In case of dynamic scenes with moving objects or a moving sensor system, a combination with RGB cameras of Time-of-Flight (TOF) cameras is suitable. As a TOF camera is an active sensor in the near infrared (NIR) and the thermal infrared camera captures the radiation emitted by the objects in the observed scene, the combination of these two sensors for close range applications is independent from external illumination or textures in the scene. This article is focused on the fusion of data acquired both with a time-of-flight (TOF) camera and a thermal infrared (TIR) camera. As the radiometric behaviour of many objects differs between the near infrared used by the TOF camera and the thermal infrared spectrum, a direct co-registration with feature points in both intensity images leads to a high number of outliers. A fully automatic workflow of the geometric calibration of both cameras and the relative orientation of the camera system with one calibration pattern usable for both spectral bands is presented. Based on the relative orientation, a fusion of the TOF depth image and the TIR image is used for scene segmentation and people detection. An adaptive histogram based depth level segmentation of the 3d point cloud is combined with a thermal intensity based segmentation. The feasibility of the proposed method is demonstrated in an experimental setup with different geometric and radiometric influences that show the benefit of the combination of TOF intensity and depth images and thermal infrared images.
ABSTRACT:Recently, several synthetic image datasets of street scenes have been published. These datasets contain various traffic signs and can therefore be used to train and test machine learning-based traffic sign detectors. In this contribution, selected datasets are compared regarding ther applicability for traffic sign detection. The comparison covers the process to produce the synthetic images and addresses the virtual worlds, needed to produce the synthetic images, and their environmental conditions. The comparison covers variations in the appearance of traffic signs and the labeling strategies used for the datasets, as well. A deep learning traffic sign detector is trained with multiple training datasets with different ratios between synthetic and real training samples to evaluate the synthetic SYNTHIA dataset. A test of the detector on real samples only has shown that an overall accuracy and ROC AUC of more than 95% can be achieved for both a small rate of synthetic samples and a large rate of synthetic samples in the training dataset.
Visual SLAM algorithms allow localizing the camera by mapping its environment by a point cloud based on visual cues. To obtain the camera locations in a metric coordinate system, the metric scale of the point cloud has to be known. This contribution describes a method to calculate the metric scale for a point cloud of an indoor environment, like a parking garage, by fusing multiple individual scale values. The individual scale values are calculated from structures and objects with a-priori known metric extension, which can be identified in the unscaled point cloud. Extensions of building structures, like the driving lane or the room height, are derived from density peaks in the point distribution. The extension of objects, like traffic signs with a known metric size, are derived using projections of their detections in images onto the point cloud. The method is tested with synthetic image sequences of a drive with a front-looking mono camera through a virtual 3D model of a parking garage. It has been shown, that each individual scale value improves either the robustness of the fused scale value or reduces its error. The error of the fused scale is comparable to other recent works.
Environment-observing vehicle camera self-calibration using a structure from motion (SfM) algorithm allows calibration over vehicle lifetime without the need of special calibration objects being present in the calibration images. Scene-specific problems with featurebased correspondence search and reconstruction during the SfM pipeline might be caused by critical objects like moving objects, poor-texture objects or reflecting objects and might have negative influence on camera calibration. In this contribution, a method to use semantic road scene knowledge by means of semantic masks for a semantic-guided SfM algorithm is proposed to make the calibration more robust. Semantic masks are used to exclude image parts showing critical objects from feature extraction, whereby semantic knowledge is obtained by semantic segmentation of the road scene images. The proposed method is tested with an image sequence recorded in a suburban road scene. It has been shown that semantic guidance leads to smaller deviations of the estimated interior orientation and distortion parameters from reference values obtained by test field calibration compared to a standard SfM algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.