2018 13th IEEE International Conference on Automatic Face &Amp; Gesture Recognition (FG 2018) 2018
DOI: 10.1109/fg.2018.00055
|View full text |Cite
|
Sign up to set email alerts
|

Energy and Computation Efficient Audio-Visual Voice Activity Detection Driven by Event-Cameras

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 15 publications
0
8
0
Order By: Relevance
“…Such cases are specifically common in robotics and for some mobile device applications. For instance, as a mobile phone application, we can use our method either for pose-free or pose-dependent extension of voice activity detection which has shown to be power efficient and noise robust [ 6 ]. As the phone stays active for long durations, EC-based processing can save considerable energy, and with the advantage of using the visual channel to cope with noisy acoustic environments.…”
Section: Discussionmentioning
confidence: 99%
See 3 more Smart Citations
“…Such cases are specifically common in robotics and for some mobile device applications. For instance, as a mobile phone application, we can use our method either for pose-free or pose-dependent extension of voice activity detection which has shown to be power efficient and noise robust [ 6 ]. As the phone stays active for long durations, EC-based processing can save considerable energy, and with the advantage of using the visual channel to cope with noisy acoustic environments.…”
Section: Discussionmentioning
confidence: 99%
“…Recovery of facial motion field and brightness while the camera is in motion were shown in several studies [ 34 , 54 ], where conventional cameras suffer from dynamic range limitations and motion blur. Finally, Savran et al [ 6 ] applied spatio-temporal event-based convolution to locate and detect lip activity, and Li et al [ 7 ] proposed a DNN on audio-visual events for speech recognition.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…The investigation of DNNs together with multimodal spiking sensors is still relatively rare. Previous studies of sensor fusion using DNNs on multimodal spiking sensors include a spiking Deep Belief network with the DVS and DAS [16], hardware equivalents for inference [17], and analog CNNs and RNNs using event-driven spike features [15], Another study used event cameras with audio input on voice recognition [18].…”
Section: Introductionmentioning
confidence: 99%