2021 IEEE Symposium on Security and Privacy (SP) 2021
DOI: 10.1109/sp40001.2021.00009
|View full text |Cite
|
Sign up to set email alerts
|

Hear "No Evil", See "Kenansville": Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems

Abstract: Automatic speech recognition and voice identification systems are being deployed in a wide array of applications, from providing control mechanisms to devices lacking traditional interfaces, to the automatic transcription of conversations and authentication of users. Many of these applications have significant security and privacy considerations. We develop attacks that force mistranscription and misidentification in state of the art systems, with minimal impact on human comprehension. Processing pipelines for… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
108
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 63 publications
(109 citation statements)
references
References 51 publications
1
108
0
Order By: Relevance
“…Meanwhile, some works have successfully provided adversarial samples on audio‐based systems with different settings and optimized objects. Such systems include speech recognition 19–22 and speaker recognition 23–26 . In the white‐box scenario, the process of generating adversarial samples using GD 17 can be simply defined as δ(δlr·δ), where δ is the gradient of with respect to δ.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Meanwhile, some works have successfully provided adversarial samples on audio‐based systems with different settings and optimized objects. Such systems include speech recognition 19–22 and speaker recognition 23–26 . In the white‐box scenario, the process of generating adversarial samples using GD 17 can be simply defined as δ(δlr·δ), where δ is the gradient of with respect to δ.…”
Section: Related Workmentioning
confidence: 99%
“…For adversarial sample attacks, existing studies have largely focused on the space of images 17,18 . Some studies have successfully provided adversarial samples on audio‐based systems, including speech recognition 19–22 and speaker recognition 23–26 . The models used in the audio‐based systems are mainly composed of Time Delay Neural Network (TDNN), Long Short‐Term Memory (LSTM), or transformer encoders, which encode high‐dimensional temporal audio into low‐dimensional feature embedding.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…A few others specifically targeted audio inputs: the earliest was the ultrasonic-based DolphinAttack (Zhang et al, 2017) and the Houdini loss for structured models (Cisse et al, 2017), followed by the effective and popular Carlini&Wagner (CW) attack for audio . Other works have extented the state-of-the art with over-the air attacks (Yuan et al, 2018;Yakura and Sakuma, 2019; and black-box attacks that do not require gradient access and transfer well (Abdullah et al, 2021). A recent line of work has improved the imperceptibility of adversarial noise by using psychoacoustic models to constrain the noise rather than standard L 2 or L ∞ bounds Schönherr et al, 2019;Qin et al, 2019).…”
Section: Attacksmentioning
confidence: 99%
“…Provided with an input, these attacks will run gradient-based iterations to craft an additive noise. They are the hardest attacks to defend against, and a great metric to evaluate defenses that will carryover well to more practical attacks run over-the-air, without gradient access or in real time Abdullah et al, 2021). We consider two threat models.…”
Section: Adversarial Attacks On Speech Recognitionmentioning
confidence: 99%