2022 IEEE International Conference on Image Processing (ICIP) 2022
DOI: 10.1109/icip46576.2022.9897432
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Spatial Sparsity for Event Cameras with Visual Transformers

Abstract: Event cameras report local changes of brightness through an asynchronous stream of output events. Events are spatially sparse at pixel locations with little brightness variation. We propose using a visual transformer (ViT) architecture to leverage its ability to process a variable-length input. The input to the ViT consists of events that are accumulated into time bins and spatially separated into non-overlapping subregions called patches. Patches are selected when the number of nonzero pixel locations within … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(4 citation statements)
references
References 12 publications
0
4
0
Order By: Relevance
“…With the recent success of transformer networks in natural language processing, and subsequent use in computer vision, there has been interest in transformer architectures with events. In fact, some characteristics of these visual transformers make them particularly compatible with event data, such as the handling of variable-length inputs and reduced computation by leveraging the event data's ability to highlight active regions [259]. Table 14 presents a summary of recent works in event camera literature that use transformer architectures, along with their applications and the event representation used.…”
Section: E Need Of Transformersmentioning
confidence: 99%
“…With the recent success of transformer networks in natural language processing, and subsequent use in computer vision, there has been interest in transformer architectures with events. In fact, some characteristics of these visual transformers make them particularly compatible with event data, such as the handling of variable-length inputs and reduced computation by leveraging the event data's ability to highlight active regions [259]. Table 14 presents a summary of recent works in event camera literature that use transformer architectures, along with their applications and the event representation used.…”
Section: E Need Of Transformersmentioning
confidence: 99%
“…Evaluated on N-MNIST [25] and ASL-DVS [26], table 1 presents the top-1 recognition accuracy on raw events, on raw events with 50% random noise, on the encrypted events by the competitor (Du et al [3]) and on our encrypted events. The approaches, which are either grid-based [13,29,33] or graph-based [32], can still have good performance when the data are filled with random noise, but they fail for recognition on the encrypted ones. In figure 5, we diagnose the learning process and the feature responses of a trained network.…”
Section: Attacks From High-level Neuromorphic Reasoningmentioning
confidence: 99%
“…In event-based vision, attention-based components have found applications in classification [42,53] and image reconstruction [54], and monocular depth estimation [31], but their use in object detection has yet to be investigated.…”
Section: Vision Transformers For Spatio-temporal Datamentioning
confidence: 99%