2019
DOI: 10.1007/978-3-030-30642-7_28
|View full text |Cite
|
Sign up to set email alerts
|

Video Synthesis from Intensity and Event Frames

Abstract: Event cameras, neuromorphic devices that naturally respond to brightness changes, have multiple advantages with respect to traditional cameras. However, the difficulty of applying traditional computer vision algorithms on event data limits their usability. Therefore, in this paper we investigate the use of a deep learning-based architecture that combines an initial grayscale frame and a series of event data to estimate the following intensity frames. In particular, a fully-convolutional encoder-decoder network… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
12
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 14 publications
(12 citation statements)
references
References 23 publications
0
12
0
Order By: Relevance
“…This is why for more complex tasks such as image reconstruction or monocular depth, state-of-the-art methods use a data-driven approach [7], [6], [19], [20], [21].Of these, many rely on recurrent architectures which can leverage long time windows of events for improved prediction [21], [7]. Although there exist many purely event-based learning methods, few address the fusion of images and events [9], [10], [22]. These approaches fuse both modalities by synchronizing and concatenating both inputs and passing them to a standard feed-forward network [9], [10], [22].…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…This is why for more complex tasks such as image reconstruction or monocular depth, state-of-the-art methods use a data-driven approach [7], [6], [19], [20], [21].Of these, many rely on recurrent architectures which can leverage long time windows of events for improved prediction [21], [7]. Although there exist many purely event-based learning methods, few address the fusion of images and events [9], [10], [22]. These approaches fuse both modalities by synchronizing and concatenating both inputs and passing them to a standard feed-forward network [9], [10], [22].…”
Section: Related Workmentioning
confidence: 99%
“…They have identical residuals and decoders but instead of recurrent state combination operators, they feature recurrent convLSTM encoders at each level. While E only receives voxel grids as input, I receives only gray-scale frames and E+I receives stacks of voxel grids and frames, similar to [9]. For E+I when a new voxel grid arrives, we stack it with a copy of the last seen image.…”
Section: B Baselinesmentioning
confidence: 99%
See 3 more Smart Citations