State-of-the-art in speaker recognition

Faundez-Zanuy, Marcos; Monte-Moreno, Enric

doi:10.1109/maes.2005.1432568

Cited by 75 publications

(26 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Related to the acoustic modeling, all current speech recognition systems are based on Hidden Markov Models (HMMs). These models are very common in several recognition problems [6,7]. For each allophone (a characteristic pronunciation of a phoneme), one HMM model is calculated as a result of a training process carried out using a speech database.…”

Section: Speech Recognitionmentioning

confidence: 99%

Research of the coating textiles’ coating Gram weight measurement system based on infrared

Zhang¹,

Du²,

Liao³

2015

Advances in Engineering Materials and Applied Mechanics

View full text Add to dashboard Cite

This paper proposes several speech technology improvements for increasing robustness, reliability and ergonomics in speech interfaces for controlling aerial vehicles. These improvements consist of including a statistical language model for increasing the robustness against spontaneous speech, incorporating confidence measures for evaluating the performance of on-line the speech engines (better reliability), and a flexible response generation for improving the interface ergonomics. This paper includes a detailed description of the speech control interface developed as a result of the collaboration between the GTH (Grupo de Tecnología del Habla or Speech Technology Group) at Universidad Politécnica de Madrid (UPM) and the company Boeing Research and Technology Europe under the contract No. 206/05. This interface includes modules that perform speech recognition, natural language understanding and response generation via a speech synthesizer. In the system evaluation, the final results reported a 96.4% Word Accuracy and a 92.2% Semantic Concept Accuracy. This paper also provides a state-of-art review of using Speech Technology for controlling aerial vehicles, comparing the main initiatives carried out. A significant conclusion of this work is that Speech Technology is now ready enough to be considered as a new modality (in parallel with traditional ones) for introducing high level commands while the controller is carrying out others actions when interacting with these control systems. In critical applications (such as this) the best performance of this technology is achieved when all the configuration possibilities of the speech engines are accessible and the speech interface is designed in collaboration with Speech Technology experts.

show abstract

Section: Speech Recognitionmentioning

confidence: 99%

Research of the coating textiles’ coating Gram weight measurement system based on infrared

Zhang¹,

Du²,

Liao³

2015

Advances in Engineering Materials and Applied Mechanics

View full text Add to dashboard Cite

show abstract

“…The first ones consist of the development of short-term features (as LPCC or MFCC) such as the use of signal decomposition methods (Wavelet, Independent Component Analysis). Other techniques aim to exploit other levels of representation such as phonetic, prosodic, idiolectal, dialogic or semantic (Faundez-Zanuy and Monte-Moreno, 2005). These features are extracted from long-term physical traits and are usually fused with the traditional spectral features (short-terms).…”

Section: Feature Extractionmentioning

confidence: 99%

“…Fortunately, speech offers a richer and wider range of possibilities when compared with other biometric traits, such as fingerprint, iris, hand geometry, face, etc. For instance, you can use a text-dependent system (Faundez-Zanuy and Monte-Moreno, 2005) and to ask the user for a specific speech sentence. Speaker recognition does not offer the same robustness and precision than other biometric traits such as fingerprint and iris.…”

Section: Introductionmentioning

confidence: 99%

Maximum likelihood linear programming data fusion for speaker recognition

Monte-Moreno¹,

Chétouani

Faundez-Zanuy³

et al. 2009

Speech Communication

View full text Add to dashboard Cite

Biometric system performance can be improved by means of data fusion. Several kinds of information can be fused in order to obtain a more accurate classification (identification or verification) of an input sample. In this paper we present a method for computing the weights in a weighted sum fusion for score combinations, by means of a likelihood model. The maximum likelihood estimation is set as a linear programming problem. The scores are derived from a GMM classifier working on a different feature extractor. Our experimental results assesed the robustness of the system in front a changes on time (different sessions) and robustness in front a change of microphone. The improvements obtained were significantly better (error bars of two standard deviations) than a uniform weighted sum or a uniform weighted product or the best single classifier. The proposed method scales computationaly with the number of scores to be fussioned as the simplex method for linear programming.

show abstract

“…However, in most applications, the alternative hypothesis model is usually ill-defined and difficult to characterize a priori. For example, in speaker verification [3][4][5][6][7], the problem of determining if a speaker is who he or she claims to be is normally formulated as follows: given an unknown utterance U, determine whether H 0 : U is from the target speaker, or H 1 : U is not from the target speaker.…”

Section: Introductionmentioning

confidence: 99%