ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
DOI: 10.1109/icassp39728.2021.9413961
|View full text |Cite
|
Sign up to set email alerts
|

Self-Attentive VAD: Context-Aware Detection of Voice from Noise

Abstract: Recent voice activity detection (VAD) schemes have aimed at leveraging the decent neural architectures, but few were successful with applying the attention network due to its high reliance on the encoder-decoder framework. This has often let the built systems have a high dependency on the recurrent neural networks, which are costly and sometimes less context-sensitive considering the scale and property of acoustic frames. To cope with this issue with the selfattention mechanism and achieve a simple, powerful, … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(1 citation statement)
references
References 14 publications
0
1
0
Order By: Relevance
“…In summary, although existing methods can detect audio copy-move forgery, there are still some issues or limitations that deserve attention and improvement. Existing passive forensics methods overly rely on segmentation techniques to segment audio into silent and voiced segments, but audio collected in noisy environments is often difficult to segment [18,19]. These methods require repeatedly adjusting the similarity threshold to make feature similarity decisions in order to determine whether there are copy-move forged segments in the audio.…”
Section: Introductionmentioning
confidence: 99%
“…In summary, although existing methods can detect audio copy-move forgery, there are still some issues or limitations that deserve attention and improvement. Existing passive forensics methods overly rely on segmentation techniques to segment audio into silent and voiced segments, but audio collected in noisy environments is often difficult to segment [18,19]. These methods require repeatedly adjusting the similarity threshold to make feature similarity decisions in order to determine whether there are copy-move forged segments in the audio.…”
Section: Introductionmentioning
confidence: 99%