Replay Attack Detection Based on Spatial and Spectral Features of Stereo Signal

Yaguchi, Ryoya; Shiota, Sayaka; Kiya, Hitoshi

doi:10.2197/ipsjjip.29.275

Cited by 2 publications

(2 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They emphasized on the non-speech segments and used spatial features based on the generalized cross correlation to identify the difference. Yaguchi et al [25] investigated the logSpec and cepstral coefficients to enhance the identification of attacks. The first feature is based on a ratio of the noise and harmonic sub-band.…”

Section: Literature Reviewmentioning

confidence: 99%

Voice spoofing countermeasure for voice replay attacks using deep learning

et al. 2022

View full text Add to dashboard Cite

In our everyday lives, we communicate with each other using several means and channels of communication, as communication is crucial in the lives of humans. Listening and speaking are the primary forms of communication. For listening and speaking, the human voice is indispensable. Voice communication is the simplest type of communication. The Automatic Speaker Verification (ASV) system verifies users with their voices. These systems are susceptible to voice spoofing attacks - logical and physical access attacks. Recently, there has been a notable development in the detection of these attacks. Attackers use enhanced gadgets to record users’ voices, replay them for the ASV system, and be granted access for harmful purposes. In this work, we propose a secure voice spoofing countermeasure to detect voice replay attacks. We enhanced the ASV system security by building a spoofing countermeasure dependent on the decomposed signals that consist of prominent information. We used two main features— the Gammatone Cepstral Coefficients and Mel-Frequency Cepstral Coefficients— for the audio representation. For the classification of the features, we used Bi-directional Long-Short Term Memory Network in the cloud, a deep learning classifier. We investigated numerous audio features and examined each feature’s capability to obtain the most vital details from the audio for it to be labelled genuine or a spoof speech. Furthermore, we use various machine learning algorithms to illustrate the superiority of our system compared to the traditional classifiers. The results of the experiments were classified according to the parameters of accuracy, precision rate, recall, F1-score, and Equal Error Rate (EER). The results were 97%, 100%, 90.19% and 94.84%, and 2.95%, respectively.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Voice spoofing countermeasure for voice replay attacks using deep learning

et al. 2022

View full text Add to dashboard Cite

show abstract

“…They emphasized on the non-speech segments and used spatial features based on the generalized cross correlation to identify the difference. Yaguchi et al [21] investigated the logSpec and cepstral coefficients to enhance the identification of attacks. The first feature is based on a ratio of the noise and harmonic sub-band.…”

Section: Literature Reviewmentioning

confidence: 99%

Voice Spoofing Countermeasure for Voice Replay Attacks using Deep Learning

Zhou

Tao

Jawawi

et al. 2022

Preprint

View full text Add to dashboard Cite

In our everyday lives, we communicate with each other using several means and channels of communication, as communication is crucial in the lives of humans. Listening and speaking are the primary forms of communication. For listening and speaking, the human voice is indispensable. Voice communication is the simplest type of communication. The Automatic Speaker Verification (ASV) system verifies users with their voices. These systems are susceptible to voice spoofing attacks - logical and physical access attacks. Recently, there has been a notable development in the detection of these attacks. Attackers use enhanced gadgets to record users' voices, replay it for the ASV system, and be granted access for harmful purposes. In this work, we propose a secure voice spoofing countermeasure for the purpose of detecting voice replay attacks. We enhanced the ASV system security by building a spoofing countermeasure dependent on the decomposed signals that consists of prominent information. We used two main features— the Gammatone Cepstral Coefficients and Mel-Frequency Cepstral Coefficients— for the audio representation. For the classification of the features, we used Bi-directional Long-Short Term Memory Network in cloud, a deep learning classifier. We investigated numerous audio features and examined each feature’s capability to obtain the most vital details from the audio for it to be labelled genuine or a spoof speech. Furthermore, we use various machine learning algorithms to illustrate the superiority of our system compared to the traditional classifiers. The results of the experiments were classified according to the parameters of accuracy, precision rate, recall, F1-score, and Equal Error Rate (EER). The results were 97\%, 100\%, 90.19\% and 94.84\%, and 2.95\%, respectively.

show abstract

Replay Attack Detection Based on Spatial and Spectral Features of Stereo Signal

Cited by 2 publications

References 19 publications

Voice spoofing countermeasure for voice replay attacks using deep learning

Voice spoofing countermeasure for voice replay attacks using deep learning

Voice Spoofing Countermeasure for Voice Replay Attacks using Deep Learning

Contact Info

Product

Resources

About