This paper is concerned with the verification effectiveness in open-set, text-independent speaker identification. The study includes an analysis of the characteristics of this mode of speaker recognition and the potential causes of errors. The use of well-known score normalisation techniques for the purpose enhancing the reliability of the process is described and their relative effectiveness is experimentally investigated. The experiments are based on the dataset proposed for the 1-speaker detection task of the NIST Speaker Recognition Evaluation 2003. Based on the experimental results, it is demonstrated that significant benefits is achieved by using score normalisation in open-set identification, and that the level of this depends highly on the type of the approach adopted. The results also show that better performance can be achieved by using the cohort normalisation methods. In particular, the unconstrained cohort method with a relatively small cohort size appears to outperform all other approaches.
A new approach to speaker change detection is proposed and investigated. The method, which is based on a probabilistic framework, provides an effective means for tackling the problem posed by phonetic variation in high-resolution speaker change detection. Additionally, the approach incorporates the capability for dealing with undesired effects of variations in speech characteristics. Using the experimental investigations conduced with clean and broadcast news audio, it is shown that the proposed method is significantly more effective than the currently popular techniques for speaker change detection. To enhance the computational efficiency of the proposed method, modified implementation algorithms are introduced which are based on the exploitation of the redundant operations and a fast scoring procedure. It is shown that, through the use of the proposed fast algorithm, the computational efficiency of the approach can be increased by over 77% without significant reduction in its accuracy. The paper discusses the principles and characteristics of the proposed speaker change detection method, and provides a detailed description of its efficient implementation. The experiments, investigating the performance of the proposed method and its effectiveness in relation to other approaches, are described and an analysis of the results is presented.
"This paper is a postprint of a paper submitted to and accepted for publication in IET Signal Processing and is subject to Institution of Engineering and Technology Copyright. The copy of record is available at IET Digital Library." [Full text of this article is not available in the UHRA]This study presents investigations into the effectiveness of the state-of-the-art speaker verification techniques (i.e. GMM-UBM and GMM-SVM) in mismatched noise conditions. Based on experiments using white and real world noise, it is shown that the verification performance offered by these methods is severely affected when the level of degradation in the test material is different from that in the training utterances. To address this problem, a modified realisation of the parallel model combination (PMC) method is introduced and a new form of test normalisation (T-norm), termed condition adjusted T-norm, is proposed. It is experimentally demonstrated that the use of these techniques with GMM-UBM can significantly enhance the accuracy in mismatched noise conditions. Based on the experimental results, it is observed that the resultant relative improvement achieved for GMM-UBM (under the most severe mismatch condition considered) is in excess of 70%. Additionally, it is shown that the improvement in the verification accuracy achieved in this way is higher than that obtainable with the direct use of PMC with GMM-UBM. Moreover, it is found that while the accuracy performance of GMM-SVM can also considerably benefit from the use of these techniques, the extensive computational cost involved in this case severely limits the use of such a combined approach in practice
Abstract-This letter presents an investigation into the use of a probabilistic pattern matching approach for detecting speaker changes in audio streams. The experiments are conducted using clean speech as well as broadcast news material. It is shown that, in the proposed approach, the use of bilateral scoring is considerably more effective than unilateral scoring. Appropriate score normalization methods are considered in the study. It is observed that in all the cases, the bilateral scoring approach outperforms the currently popular method of Bayesian information criterion (BIC) for speaker change detection. This letter discusses the principles of the proposed approach and details the experimental investigations.
This paper focuses on the spectral representation of the sub-band cepstrum in relation to that of the full-band cepstrum. Through theoretical analysis it is shown that the net spectral information content of the cepstral coefficients with the same index in different sub-bands is only comparable to that of a full-band cepstral parameter whose quefrency is given by the product of that specific index with the number of sub-bands. A new method is proposed to tackle this deficiency of the sub-band cepstrum when it is used in the context of text-dependent speaker verification. The experimental investigations have clearly demonstrated the effectiveness of this method in speaker verification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.