Economical speaker recognition solution from degraded human voice signal is still a challenge. This article is covering results of an experiment which targets to improve feature extraction method for effective speaker identification from degraded human audio signal with the help of data science. Every speaker's audio has identical characteristics. Human ears can easily identify these different audio characteristics and classify speaker from speaker's audio. Mel-Frequency Cepstral Coefficient (MFCC) supports to get same intelligence in machine also. MFCC is extensively used for human voice feature extraction. In our experiment we have effectively used MFCC and Linear Predictive Coding (LPC) for better speaker recognition accuracy. MFCC first outlines frames and then finds cepstral coefficient for each frame. MFCC use human audio signal and convert it in numerical value of audio features, which is used to recognize speaker efficiently by Artificial Intelligence (AI) based speaker recognition system. This article covers how effectively audio features can be extracted from degraded human voice signal. In our experiment we have observed improved Equal Error Rate (EER) and True Match Rate (TMR) due to high sampling rate and low frequency range for mel-scale triangular filter. This article also covers pre-emphasis effects on speaker recognition when high background noise comes with audio signal.
Speaker's audio is one of the unique identities of the speaker. Nowadays not only humans but machines can also identify humans by their audio. Machines identify different audio properties of the human voice and classify speaker from speaker's audio. Speaker recognition is still challenging with degraded human voice and limited dataset. Speaker can be identified effectively when feature extraction from voice is more accurate. Mel-Frequency Cepstral Coefficient (MFCC) is mostly used method for human voice feature extraction. We are introducing improved feature extraction method for effective speaker recognition from degraded human audio signal. This article presents experiment results of modified MFCC with Gaussian Mixture Model (GMM) on uniquely developed degraded human voice dataset. MFCC uses human audio signal and transforms it into a numerical value of audio characteristics, which is utilized to recognize speaker efficiently with the help of data science model. Experiment uses degraded human voice when high background noise comes with audio signal. Experiment also covers, Sampling Frequency (SF) impacts on human audio when "Signal to Noise Ratio" (SNR) is low (up to 1dB) in overall speaker identification process. With modified MFCC, we have observed improved speaker recognition when speaker voice SNR is upto 1dB due to high SF and low frequency range for mel-scale triangular filter.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.