To improve the performance of speaker identification systems, an effective and robust method is proposed to extract features for speech processing, capable of operating in the clean and noisy environment. For capturing the characteristics of the signal, the Mel-frequency Cepstral Coefficient along with RASTA of the wavelet channels is calculated. Then the proposed feature extraction algorithm is evaluated on the speech database for text-dependent and text-independent speaker identification using the Gaussian Mixture Model (GMM) and Vector Quantization (VQ) identifier. Gaussian Mixture Models (GMMs) were used for the recognition stage as they give better recognition rate for the speaker's features than Vector Quantization. Some popular existing feature extraction methods MFCCs, LPC, LPC+DWT, MFCC+RASTA are also evaluated for comparison in this paper. Comparison of the proposed approach with the conventional feature extraction methods shows that the proposed method not only effectively reduces the influence of noise but also improves recognition accuracy. In addition, the performance of our method is very satisfactory in the noisy environment. A recognition rate of 98.63% was obtained using the proposed feature extraction technique. Keywords: GMM, VQ, RASTA, MFCC, LPC, DWT, Recognition Rate.
I. INTRODUCTIONSpeaker recognition has been an interesting research field for the last decades. Basically, speaker recognition of particular speaker is based upon the individual information stored in the speech waves. A lot of research has been carried out in the past years in order to create the ideal which is able to understand continuous speech in real time, from different speakers and in any environment. There is a lot of information about the gender, emotion, language being spoken and identity of the speaker which can be retrieved from the speech signal. Speech Signal can be used for many speech recognition, speech processing applications especially security and authentication. The significance of speech recognition lies in its simplicity. This simplicity and ease of operating a device using speech have many advantages like security devices, household appliances, cellular phones, ATM machines, computers etc [3]. There exists a number of difficulties which arises during speaker recognition, which are the existence of unwanted noise signals from the speaker's surrounding environment and speaker variability such as gender, speaking style, the speed of speech [5]. Speaker recognition is divided into two phases which are speaker identification and speaker verification. The registered speaker is found out on the basis of speech input in speaker identification phase, while verification is the task of automatically determining if a person really is the person he or she claims to be. In This article, we are primarily interested in speaker recognition in the text-dependent mode of isolated words and continuous speech applied to the English Language. Speaker recognition systems are classified as text dependent and text ind...