2012
DOI: 10.1109/tasl.2012.2201472
|View full text |Cite
|
Sign up to set email alerts
|

Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech

Abstract: Abstract-In this paper, we evaluate the vulnerability of speaker verification (SV) systems to synthetic speech. The SV systems are based on either the Gaussian mixture modeluniversal background model (GMM-UBM) or support vector machine (SVM) using GMM supervectors. We use a hidden Markov model (HMM)-based text-to-speech (TTS) synthesizer, which can synthesize speech for a target speaker using small amounts of training data through model adaptation of an average voice or background model. Although the SV system… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

3
111
0
2

Year Published

2014
2014
2021
2021

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 211 publications
(118 citation statements)
references
References 40 publications
3
111
0
2
Order By: Relevance
“…These methods use features mostly based on the audio spectrogram, such as spectral-and cepstralbased features [17], phase-based features [18], the combination of amplitude and phase features [19], and audio quality based features [20]. Features directly extracted from a spectrogram can also be used, as per the recent work that relies on local maxima of spectrogram [21], which showed an impressive performance, albeit for the evaluation database that was based on a set of speech recordings collected with VoIP phones, which provided little challenge for an anti-spoofing system.…”
Section: A Featuresmentioning
confidence: 99%
“…These methods use features mostly based on the audio spectrogram, such as spectral-and cepstralbased features [17], phase-based features [18], the combination of amplitude and phase features [19], and audio quality based features [20]. Features directly extracted from a spectrogram can also be used, as per the recent work that relies on local maxima of spectrogram [21], which showed an impressive performance, albeit for the evaluation database that was based on a set of speech recordings collected with VoIP phones, which provided little challenge for an anti-spoofing system.…”
Section: A Featuresmentioning
confidence: 99%
“…Existing approaches to feature extraction for speech spoofing attack detection methods include spectral-and cepstral-based features [8], phase-based features [9], the combination of amplitude and phase features of the spectrogram [17], and audio quality based features [18]. Features directly extracted from a spectrogram can also be used, as per the recent work that relies on local maxima of spectrogram [19].…”
Section: Featuresmentioning
confidence: 99%
“…Compared to cepstral coefficients, using phase information extracted from the signal seem to be more effective for anti-spoofing detection, as it was shown by De Leon et al [9] and Wu et al [20]. However, the most popular recent approaches rely on the combination of spectral-based and phase-based features [17,21,22,23].…”
Section: Featuresmentioning
confidence: 99%
See 1 more Smart Citation
“…Relative Phase Shift (RPS) representation (Saratxaga et al, 2009) for the harmonic phase has also be used to build SSD systems aimed to detect spoofing signals created with adapted synthetic voices (De Leon et al, 2011) (De Leon et al, 2012 with good results. The initial works were focused on evaluating the actual capability of the RPSs to detect the phase modifications due to the synthetic generation of the spoofing signals.…”
Section: Introductionmentioning
confidence: 99%