Event-based vision is an emerging field of computer vision that offers unique properties, such as asynchronous visual output, high temporal resolutions, and dependence on brightness changes, to generate data. These properties can enable robust high-temporal-resolution object detection and tracking when combined with frame-based vision. In this paper, we present a hybrid, high-temporal-resolution object detection and tracking approach that combines learned and classical methods using synchronized images and event data. Off-the-shelf frame-based object detectors are used for initial object detection and classification. Then, event masks, generated per detection, are used to enable inter-frame tracking at varying temporal resolutions using the event data. Detections are associated across time using a simple, low-cost association metric. Moreover, we collect and label a traffic dataset using the hybrid sensor DAVIS 240c. This dataset is utilized for quantitative evaluation using state-of-the-art detection and tracking metrics. We provide ground truth bounding boxes and object IDs for each vehicle annotation. Further, we generate high-temporal-resolution ground truth data to analyze tracking performance at different temporal rates. Our approach shows promising results, with minimal performance deterioration at higher temporal resolutions (48–384 Hz) when compared with the baseline frame-based performance at 24 Hz.
<div class="section abstract"><div class="htmlview paragraph">Adverse weather conditions degrade the quality of images used in vision-based advanced driver assistance systems (ADAS) and autonomous driving algorithms. Adherent raindrops onto a vehicle’s windshield occlude parts of the input image and blur background texture in regions covered by them. Rain also changes image intensity and disturbs chromatic properties of color images. In this work, we collected a dataset using a camera mounted behind a windshield at different rain intensities. The data was processed to generate a set of distorted images by adherent raindrops along with ground truth data of clear images (just after a windshield wipe). We quantitatively evaluated the amount of distortion caused by the raindrops, using the Normalized Cross-Correlation and Structural Similarity methods. While most prior work in the field of rain detection and removal focuses on the image restoration aspects, they typically do not provide quantitative measures to the effect of degradation of input image quality on the performance of image-based algorithms. We quantitatively evaluated the effect of raindrop distortion on deep-learning-based object detection algorithms by comparing the detectors’ performance on the distorted images to the clear images. State-of-the-art detector algorithms were selected and used, namely, Faster Region-based Convolution Neural Network (R-CNN), Single Shot Detector (SSD) and You Only Look Once (YOLO). For the overall performance of the object detection and classification algorithms, we used standard accuracy, precision, and recall measures.</div></div>
Event-based vision has been rapidly growing in recent years justified by the unique characteristics, such as its high temporal resolutions (∼1 μs), high dynamic range (>120 dB), and output latency of only a few microseconds. Our work further explores a hybrid, multimodal approach for object detection and tracking that leverages state-of-the-art frame-based detectors complemented by hand-crafted event-based methods to improve the overall tracking performance with minimal computational overhead. The methods presented include event-based bounding box (BB) refinement that improves the precision of the resulting BBs, as well as a continuous event-based object detection method, to recover missed detections and generate interframe detections that enable a high-temporal-resolution tracking output. The advantages of these methods are quantitatively verified by an ablation study using the higher order tracking accuracy (HOTA) metric. Results show significant performance gains resembled by an improvement in the HOTA from 56.6%, using only frames, to 64.1% and 64.9%, for the event and edgebased mask configurations combined with the two methods proposed, at the baseline frame rate of 24 Hz. Likewise, incorporating these methods with the same configurations has improved HOTA from 52.5% to 63.1% and from 51.3% to 60.2% at the high-temporal-resolution tracking rate of 384 Hz. Finally, a validation experiment is conducted to analyze the real-world singleobject tracking performance using high-speed LiDAR. Empirical evidence shows that our approaches provide significant advantages compared to using frame-based object detectors at the baseline frame rate of 24 Hz and higher tracking rates of up to 500 Hz.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.