This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.
This paper describes a new technique, called the empirical mode decomposition (EMD), that allows the decomposition of one-dimensional signals into intrinsic oscillatory modes. The components, called intrinsic mode functions (IMFs), allow the calculation of a meaningful multicomponent instantaneous frequency. Applied to a seismic trace, the EMD allows us to study the di erent intrinsic oscillatory modes and instantaneous frequencies of the trace. Applied to a seismic section, it provides new time-frequency attributes.
This paper presents the ELlSA consortium activities in automatic speaker segmentation during last NIST 2002 evaluation: two different approaches from CLIPS and LIA laboratories are presented and the possibility of combining them either by applying them consecutively, or by fusing the decisions made by each of them. is investigated. Various types of data werc available for NIST 2002. The ELlSA systems obtained the lower error rates for two corpora: the CI.IPS system obtained thc best performance on the Meeting data, the LIA system obtained the best performance on the Switchboard data. The combining strategies proposed in this paper allowed us to improve the performance of the best single system on both data types (up to 30 Yo oferror rate reduction).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.