Speaker recognition is the areu ofspeech teechnoloa, which atfem@r to identfy the speukeu. There are hvn distinct uppuuaches in fhis domain. One is bused on hlind DSP prnceming where we deal wifh the signal regardless of its content. The other is phonetic based where indviihal phonemic unity are identij'ied processed and niafched on (he basis of cevtuin,fiatures. Our vestarch is relafed lo the second approach and we have iried io ideiifjfi high level phonemic fearuves Iike sfyle etc. from the pocessing of lower formants. We hove lried to process these ,for.mnnls on transition boundaries and dleivp[ed to demonstrate that each person has a speciJii. pUtl~r7 oJrisc andxall in the tiansirion region. Ow bosic hyputhesis is: The tiwrisiliun gf ,formunh ut the coirsonanf-vuwel bozincfurir.r is corrirolled by the brain and it conlains a speciJic signatirt-e ut' the speuker. Expei-itnents have shown thut our method has a success rate of' 90%. Moreover., the system gives desiruble resulis, eveti in a noisy environment due to the choice o j lower formants.
The impact of Short Utterances in Speaker Recognition is of significant importance. Despite the advancements in short utterance speaker recognition (SUSR), text dependence and the role of phonemes in carrying speaker information needs further investigation. This paper presents a novel method of using vowel categories for SUSR. We define Vowel Categories (VC's) considering Chinese and English languages. After recognition and extraction of phonemes, the obtained vowels are divided into VC's, which are then used to develop Universal Background VC Models (UBVCM) for each VC. Conventional GMM-UBM system is used for training and testing. The proposed categories give minimum EERs of 13.76%, 14.03% and 16.18% for 3, 2 and 1 second respectively. Experimental results show that in text dependent SUSR, significant speaker-specific information is present at phoneme level. The similar properties of phonemes can be used such that accurate speech recognition is not required, rather Phoneme Categories can be used effectively for SUSR. Also, it is shown that vowels contain large amount of speaker information, which remains undisturbed when VC are employed.
Information of speech units like vowels, consonants and syllables can be a kind of knowledge used in text-independent Short Utterance Speaker Recognition (SUSR) in a similar way as in text-dependent speaker recognition. In such tasks, data for each speech unit, especially at the time of recognition, is often not enough. Hence, it is not practical to use the full set of speech units because some of the units might not be well trained. To solve this problem, a method of using speech unit categories rather than individual phones is proposed for SUSR, wherein similar speech units are put together, hence solving the problem of sparse data. We define Vowel, Consonant, and Syllable Categories (VC, CC and SC) with Standard Chinese (Putonghua) as a reference. A speech utterance is recognized into VC, CC ad SC sequences which are used to train Universal Background Models (UBM) for each speech unit category in the training procedure, and to perform speech unit category dependent speaker recognition, respectively. Experimental results in Gaussian Mixture Model-Universal Background Model (GMM-UBM) based system give a relative equal error rate (EER) reduction of 54.50% and 40.95% from minimum EERs of VCs and SCs, respectively, for 2 seconds of test utterance compared with the existing SUSR systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.