Elie Khoury scite author profile

Automatic speaker verification (ASV) systems are subject to various kinds of malicious attacks. Replay, voice conversion and speech synthesis attacks drastically degrade the performance of a standard ASV system by increasing its false acceptance rates. This issue raised a high level of interest in the speech research community where the possible voice spoofing attacks and their related countermeasures have been investigated. However, much less effort has been devoted in creating realistic and diverse spoofing attack databases that foster researchers to correctly evaluate their countermeasures against attacks. The existing studies are not complete in terms of types of attacks, and often difficult to reproduce because of unavailability of public databases. In this paper we introduce the voice spoofing data-set of AVspoof, a public audio-visual spoofing database. AVspoof includes ten realistic spoofing threats generated using replay, speech synthesis and voice conversion. In addition, we provide a set of experimental results that show the effect of such attacks on current state-of-the-art ASV systems.

show abstract

Generalization of Audio Deepfake Detection

Chen¹,

Kumar²,

Nagarsheth³

et al. 2020

123

View full text Add to dashboard Cite

Replay Attack Detection Using DNN for Channel Discrimination

Nagarsheth¹,

Khoury

Patil

et al. 2017

View full text Add to dashboard Cite

Voice is projected to be the next input interface for portable devices. The increased use of audio interfaces can be mainly attributed to the success of speech and speaker recognition technologies. With these advances comes the risk of criminal threats where attackers are reportedly trying to access sensitive information using diverse voice spoofing techniques. Among them, replay attacks pose a real challenge to voice biometrics. This paper addresses the problem by proposing a deep learning architecture in tandem with low-level cepstral features. We investigate the use of a deep neural network (DNN) to discriminate between the different channel conditions available in the ASVSpoof 2017 dataset, namely recording, playback and session conditions. The high-level feature vectors derived from this network are used to discriminate between genuine and spoofed audio. Two kinds of low-level features are utilized: state-ofthe-art constant-Q cepstral coefficients (CQCC), and our proposed high-frequency cepstral coefficients (HFCC) that derive from the high-frequency spectrum of the audio. The fusion of both features proved to be effective in generalizing well across diverse replay attacks seen in the evaluation of the ASVSpoof 2017 challenge, with an equal error rate of 11.5%, that is 53% better than the baseline Gaussian Mixture Model (GMM) applied on CQCC.

show abstract

Audiovisual diarization of people in video content

Khoury

Sènac

Joly

2012

Multimed Tools Appl

View full text Add to dashboard Cite

Audio-Visual People Diarization (AVPD) is an original framework that simultaneously improves audio, video, and audiovisual diarization results. Following a literature review of people diarization for both audio and video content and their limitations, which includes our own contributions, we describe a proposed method for associating both audio and video information by using co-occurrence matrices and present experiments which were conducted on a corpus containing TV news, TV debates, and movies. Results show the effectiveness of the overall diarization system and confirm the gains audio information can bring to video indexing and vice versa.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Elie Khoury

On the vulnerability of speaker verification to realistic voice spoofing

Generalization of Audio Deepfake Detection

Replay Attack Detection Using DNN for Channel Discrimination

Audiovisual diarization of people in video content

Contact Info

Product

Resources

About