On the use of perceptual Line Spectral pairs Frequencies and higher-order residual moments for Speaker Identification

Sahidullah, Md; Chakroborty, Sandipan; Saha, Goutam

doi:10.1504/ijbm.2010.035450

Cited by 8 publications

(4 citation statements)

References 43 publications

(54 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The results of the research may be generalized to a new finding that one efficient parameter (here the spectral slope) derived from a suitable subband of smoothed long-term spectrum is sufficient to successfully discriminate against speakers. When recognizing speakers and having long utterances available, the long-term speech spectrum can complement the traditional short-term voice features such as pitch [13], mel-frequency cepstral coefficients [11], line spectral pair frequencies [19], etc. and so help to improve recognition systems.…”

Section: Discussionmentioning

confidence: 99%

Speaker Discrimination Using Long-Term Spectrum of Speech

Sigmund

2019

ITC

View full text Add to dashboard Cite

In this article, we investigate a specific long-term speech spectrum with respect to its use for speaker recognition. The long-term effect was satisfied by averaging short-term autocorrelation coefficients over the whole utterance. The long-term spectrum was calculated by means of second-order linear prediction using the average autocorrelation coefficients. First, speaker discriminability of 32 individual parameters was evaluated by combining spectral energy and spectral slope in eight different frequency bands covering the range 0−4 kHz (seven narrow nonoverlapping subbands and one band spanning over the full range). Then, four subbands with the most discriminative capability were selected for speaker recognition. These subbands involve the frequencies of 0−1.2 kHz in total. In the main experiments, text-independent speaker recognition based on relative Euclidean distance was performed in each single subband as well as in multiple 2 to 4 subbands applying two types of speech data, complete continuous speech and voiced part of the same speech. The voiced speech seems to be generally more effective for speaker recognition using the long-term speech spectrum. The best recognition rates, i.e. 91.7% on complete speech and 100% on voiced speech, were achieved in optimal paired subbands. The long-term speech spectrum can complement the traditional voice features.

show abstract

Section: Discussionmentioning

confidence: 99%

Speaker Discrimination Using Long-Term Spectrum of Speech

Sigmund

2019

ITC

View full text Add to dashboard Cite

show abstract

“…The LSP parameters are expressed as the zeroes (or roots) of P ( z ) and Q ( z ). The zeroes uniquely determine P ( z ), Q ( z ), and A ( z ) can be made up of P ( z ) and Q ( z ) (Sahidullah, Chakroborty, & Saha, 2010).

A ((), z) = \frac{1}{2} ((), P ((), z) + Q ((), z))

…”

Section: The Features Based On Linear Prediction Analysismentioning

confidence: 99%

Evolutionary fusion of classifiers trained on linear prediction based features for replay attack detection

Nasersharif

Yazdani

2021

Expert Systems

View full text Add to dashboard Cite

Recently, linear prediction analysis (LP) related features have been successfully used for replay attack detection due to the imperfection in the LP‐based signal produced by recording and playback devices. In this paper, we propose a weighted linear combination of classifier scores for replay attack detection where our classifiers, including Gaussian mixture models (GMMs) and support vector machines (SVMs), are trained on a variety of LP and LP residual‐based features. In this way, we can benefits from all of the LP‐related features when we combine classifiers trained on these features. We determine classifier weights using two evolutionary algorithms: genetic algorithm and particle swarm optimization. Furthermore, we propose a new feature based on performing LP residuals analysis of Mel sub‐band energies. We also propose a deep structure for extracting deep features from LP‐based coefficients to consider the class labels (genuine or spoofed speaker) in the feature extraction process. Results of our classifier system on the ASVspoof 2017 version 2 dataset show equal error rates of 0.3% and 4.8% for its development and evaluation subset, respectively. We also applied our proposed replay attack detection method to an ASV system that has acceptable results.

show abstract

“…Line Spectral Pairs (LSP) are popular alternative representation of Linear Prediction Coefficients (LPC). LSPs are useful for speech coding as they have some properties that make them superior to direct quantization of LPCs [23].…”

Section: Line Spectral Frequency (Lsp)mentioning

confidence: 99%

Gender Identification from Arabic Speech Using Machine Learning

Hamdi

Moussaoui

Oussalah

et al. 2020

Modelling and Implementation of Complex Systems

View full text Add to dashboard Cite

Speech recognition is becoming increasingly used in realworld applications. One of the interesting applications is automatic gender recognition which aims to recognize male and female voices from short speech samples. This can be useful in applications such as automatic dialogue systems, system verification, prediction of demographic attributes (e.g., age, location) and estimating person's emotional state. This paper focuses on gender identification from the publicly available dataset Arabic Natural Audio Dataset (ANAD) using an ensembleclassifier based approach. More specifically, initially we extended the original ANAD to include a gender label information through a manual annotation task. Next, in order to optimize the feature engineering process, a three stage machine learning approach is devised. In the first phase, re restricted to features to the two widely used ones; namely, MFCC and fundamental frequency coefficients. In the second phase, six distinct acoustic features were employed. Finally, in the third phase, the features were selected according to their associated weights in Random Forest Classifier, and the best features are thereby selected. The latter approach enabled us to achieve a classification rate of 96.02% on the test set generated with linear SVM classifier.

show abstract

On the use of perceptual Line Spectral pairs Frequencies and higher-order residual moments for Speaker Identification

Cited by 8 publications

References 43 publications

Speaker Discrimination Using Long-Term Spectrum of Speech

Speaker Discrimination Using Long-Term Spectrum of Speech

Evolutionary fusion of classifiers trained on linear prediction based features for replay attack detection

Gender Identification from Arabic Speech Using Machine Learning

Contact Info

Product

Resources

About