2021
DOI: 10.1109/lra.2021.3060707
|View full text |Cite
|
Sign up to set email alerts
|

Combining Events and Frames Using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction

Abstract: Event cameras are novel vision sensors that report per-pixel brightness changes as a stream of asynchronous "events". They offer significant advantages compared to standard cameras due to their high temporal resolution, high dynamic range and lack of motion blur. However, events only measure the varying component of the visual signal, which limits their ability to encode scene context. By contrast, standard cameras measure absolute intensity frames, which capture a much richer representation of the scene. Both… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
63
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 117 publications
(64 citation statements)
references
References 36 publications
1
63
0
Order By: Relevance
“…In the last years event-based vision has attracted increasing research interest in the robotics and computer vision communities [3]. Most of the existing works have focused on the development of event-based methods for well known fundamental problems such as feature detection and tracking [4], optical flow estimation [5], depth estimation [6], robot localisation [7], motion and object segmentation [8], object detection [9], feedback control [10], and visual servoing [11], among others.…”
Section: Related Workmentioning
confidence: 99%
“…In the last years event-based vision has attracted increasing research interest in the robotics and computer vision communities [3]. Most of the existing works have focused on the development of event-based methods for well known fundamental problems such as feature detection and tracking [4], optical flow estimation [5], depth estimation [6], robot localisation [7], motion and object segmentation [8], object detection [9], feedback control [10], and visual servoing [11], among others.…”
Section: Related Workmentioning
confidence: 99%
“…Event-based Vision. Taking advantage of the eventbased cameras' inherent ability to perceive changes [24,31], researchers have started creating new solutions to tackle traditional computer vision problems exploiting this new way of sensing the world, including optical flow prediction [37,113], motion segmentation [73,112], depth estimation [35,44], and many others. While traditional cameras are capable of providing very rich visual information at the tradeoff of slow and often redundant updates, event-based cameras are asynchronous and spatially sparse, and capable of microseconds temporal resolution.…”
Section: Related Workmentioning
confidence: 99%
“…While traditional cameras are capable of providing very rich visual information at the tradeoff of slow and often redundant updates, event-based cameras are asynchronous and spatially sparse, and capable of microseconds temporal resolution. Event-based systems range from designs that focus on exploiting and maintaining event-camera sparsity during computation [4,85,107], to algorithms that combine events with standard cameras [7,35,46,78,99], exploiting the complementarity of the two. With the goal of achieving minimum-delay computing, research has also focused on asynchronous designs, either by modifying regular CNNs [5,69] or by utilizing specific hardware solutions [2,21,29], often leveraging on bio-inspired computing frameworks [68].…”
Section: Related Workmentioning
confidence: 99%
“…Compared to normal cameras, the event camera has some complementary characteristics, such as high dynamic range (140dB), no motion blur, and response in microseconds [23]. Recently, different approaches based on event data have been proposed, such as 3D reconstruction and 6-DOF tracking [51], monocular depth prediction [52], optical flow estimation [53], as well as object detection and recognition [54]. Event cameras asynchronously encode intensity changes at each pixel with position, time, and polarity: (x, y, t, p).…”
Section: Event-based Visionmentioning
confidence: 99%