This paper investigates the fusion of Mel-frequency cepstral coefficients (MFCC) and statistical pH features to improve the performance of speaker verification (SV) in non-stationary noise conditions. The -integrated Gaussian Mixture Model ( -GMM) classifier is adopted for speaker modeling. Two different approaches are applied to reduce the effects of noise corruption in the SV task: speech enhancement and multi-style training (MT). The spectral subtraction with minimum statistics (MS/SS) and the optimally-modified log-spectral amplitude with improved minima controlled recursive averaging (IMCRA/OMLSA) are examined for the speech enhancement procedure.
The MT techniques are based on colored (Colored-MT), white (White-MT) and narrow-band (Narrow-MT) noises. Six real non-stationary noises, collected from different acoustic sources, are used to corrupt the TIMIT speech database in four different signal-to-noise ratios (SNR). The index of non-stationarity (INS) is chosen for the stationarity tests of the acoustic noises. Complementary SV experiments are conducted in realistic noisy conditions using the MIT database. The results show that the best SV accuracy was obtained with the MFCC + pH features fusion, the MS/SS and the Colored-MT.Index Terms-Features fusion, Hurst exponent, multi-style training, non-stationary acoustic noise, speaker verification, speech enhancement, -GMM.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.