When the Differences in Frequency Domain are Compensated: Understanding and Defeating Modulated Replay Attacks on Automatic Speech Recognition

Wang, Shu; Cao, Jiahao; He, Xu; Sun, Kun; Li, Qi

doi:10.1145/3372297.3417254

Cited by 30 publications

(11 citation statements)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Meanwhile, recent studies have shown ASR systems are vulnerable to various malicious voice attacks [29], [67], [50], [38], [69], [57], [81], [61], [56], [1], [15], [58], [26], [16], [80], [27]. As an alternative representation of voice signals, frequency spectrum has been manipulated by attackers to achieve different attacking goals.…”

Section: Introductionmentioning

confidence: 99%

Compensating Removed Frequency Components: Thwarting Voice Spectrum Reduction Attacks

Wang,

Sun,

2024

Proceedings 2024 Network and Distributed System Security Symposium

View full text Add to dashboard Cite

Automatic speech recognition (ASR) provides diverse audio-to-text services for humans to communicate with machines. However, recent research reveals ASR systems are vulnerable to various malicious audio attacks. In particular, by removing the non-essential frequency components, a new spectrum reduction attack can generate adversarial audios that can be perceived by humans but cannot be correctly interpreted by ASR systems. It raises a new challenge for content moderation solutions to detect harmful content in audio and video available on social media platforms. In this paper, we propose an acoustic compensation system named ACE to counter the spectrum reduction attacks over ASR systems. Our system design is based on two observations, namely, frequency component dependencies and perturbation sensitivity. First, since the Discrete Fourier Transform computation inevitably introduces spectral leakage and aliasing effects to the audio frequency spectrum, the frequency components with similar frequencies will have a high correlation. Thus, considering the intrinsic dependencies between neighboring frequency components, it is possible to recover more of the original audio by compensating for the removed components based on the remaining ones. Second, since the removed components in the spectrum reduction attacks can be regarded as an inverse of adversarial noise, the attack success rate will decrease when the adversarial audio is replayed in an overthe-air scenario. Hence, we can model the acoustic propagation process to add over-the-air perturbations into the attacked audio. We implement a prototype of ACE and the experiments show that ACE can effectively reduce up to 87.9% of ASR inference errors caused by spectrum reduction attacks. Furthermore, by analyzing the residual errors on real audio samples, we summarize six general types of ASR inference errors and investigate the error causes and potential mitigation solutions.

show abstract

Section: Introductionmentioning

confidence: 99%

Compensating Removed Frequency Components: Thwarting Voice Spectrum Reduction Attacks

Wang,

Sun,

2024

Proceedings 2024 Network and Distributed System Security Symposium

View full text Add to dashboard Cite

show abstract

“…In the 2010s, traditional biometric authentication thrived, for example, using face recognition to unlock a smartphone and fingerprint recognition to unlock a door. Nevertheless, these traditional biometric authentication are vulnerable to replay and presentation attacks [5], [6].…”

Section: Introductionmentioning

confidence: 99%

SoK: An Overview of PPG's Application in Authentication

Li¹,

Chen²,

Pan³

et al. 2022

Preprint

View full text Add to dashboard Cite

Biometric authentication prospered during the 2010s. Vulnerability to spoofing attacks remains an inherent problem with traditional biometrics. Recently, unobservable physiological signals (e.g., Electroencephalography, Photoplethysmography, Electrocardiography) as biometrics have been considered a potential solution to this problem. In particular, Photoplethysmography (PPG) measures the change of blood flow of the human body by an optical method. Clinically, researchers commonly use PPG signals to obtain patients' blood oxygen saturation, heart rate, and other information to assist in diagnosing heart-related diseases. Since PPG signals are easy to obtain and contain a wealth of individual cardiac information, researchers have begun to explore its potential applications in information security. The unique advantages (simple acquisition, difficult to steal, and live detection) of the PPG signal allow it to improve the security and usability of the authentication in various aspects. However, the research on PPG-based authentication is still in its infancy. The lack of systematization hinders new research in this field. We conduct a comprehensive study of PPGbased authentication and discuss these applications' limitations before pointing out future research directions.

show abstract

“…Two main security attacks have recently emerged to tamper with speaker verification systems. One is called replay attacks that record the legitimate user's speech and then replay it to fool a speaker verification system [19]. Such a sniffing and spoofing attack requires an attacker to obtain a legitimate user's audio.…”

Section: Introductionmentioning

confidence: 99%

On the Detection of Adaptive Adversarial Attacks in Speaker Verification Systems

Chen¹

2022

Preprint

View full text Add to dashboard Cite

Speaker verification systems have been widely used in smart phones and Internet of things devices to identify a legitimate user. In recent work, it has been shown that adversarial attacks, such as FAKEBOB, can work effectively against speaker verification systems. The goal of this paper is to design a detector that can distinguish an original audio from an audio contaminated by adversarial attacks. Specifically, our designed detector, called MEH-FEST, calculates the minimum energy in high frequencies from the short-time Fourier transform of an audio and uses it as a detection metric. Through both analysis and experiments, we show that our proposed detector is easy to implement, fast to process an input audio, and effective in determining whether an audio is corrupted by FAKEBOB attacks. The experimental results indicate that the detector is extremely effective: with near zero false positive and false negative rates for detecting FAKEBOB attacks in Gaussian mixture model (GMM) and i-vector speaker verification systems. Moreover, adaptive adversarial attacks against our proposed detector and their countermeasures are discussed and studied, showing the game between attackers and defenders.

show abstract

When the Differences in Frequency Domain are Compensated: Understanding and Defeating Modulated Replay Attacks on Automatic Speech Recognition

Cited by 30 publications

References 52 publications

Compensating Removed Frequency Components: Thwarting Voice Spectrum Reduction Attacks

Compensating Removed Frequency Components: Thwarting Voice Spectrum Reduction Attacks

SoK: An Overview of PPG's Application in Authentication

On the Detection of Adaptive Adversarial Attacks in Speaker Verification Systems

Contact Info

Product

Resources

About