This paper presents a novel fully automatic bi-modal, face and speaker, recognition system which runs in real-time on a mobile phone. The implemented system runs in real-time on a Nokia N900 and demonstrates the feasibility of performing both automatic face and speaker recognition on a mobile phone. We evaluate this recognition system on a novel publicly-available mobile phone database and provide a well defined evaluation protocol. This database was captured almost exclusively using mobile phones and aims to improve research into deploying biometric techniques to mobile devices. We show, on this mobile phone database, that face and speaker recognition can be performed in a mobile environment and using score fusion can improve the performance by more than 25% in terms of error rates.
Spoofing countermeasures aim to protect automatic speaker verification systems from being manipulated by spoofed speech signals. While results from the most recent ASVspoof 2019 evaluation show great potential to detect most forms of attack, some continue to evade detection. This paper reports the first application of RawNet2 to anti-spoofing. RawNet2 ingests raw audio and has potential to learn cues that are not detectable using more traditional countermeasure solutions. We describe modifications made to the original RawNet2 architecture so that it can be applied to anti-spoofing. For A17 attacks, our RawNet2 systems results are the second-best reported, while the fusion of RawNet2 and baseline countermeasures gives the second-best results reported for the full ASVspoof 2019 logical access condition. Our results are reproducible with open source software.
International audienceThis paper evaluates the performance of face and speaker verification techniques in the context of a mobile environment. The mobile environment was chosen as it provides a realistic and challenging test-bed for biometric person verification techniques to operate. For instance the audio environment is quite noisy and there is limited control over the illumination conditions and the pose of the subject for the video. To conduct this evaluation, a part of a database captured during the " Mobile Biometry " (MOBIO) European Project was used. In total there were nine participants to the evaluation who submitted a face verification system and five participants who submitted speaker verification systems. The results have shown that the best performing face and speaker verification systems obtained the same level of performance, respectively 10.9% and 10.6% of HTER
SIDEKIT is a new open-source Python toolkit that includes a large panel of state-of-the-art components and allow a rapid prototyping of an end-to-end speaker recognition system. For each step from front-end feature extraction, normalization, speech activity detection, modelling, scoring and visualization, SIDEKIT offers a wide range of standard algorithms and flexible interfaces. The use of a single efficient programming and scripting language (Python in this case), and the limited dependencies, facilitate the deployment for industrial applications and extension to include new algorithms as part of the whole tool-chain provided by SIDEKIT. Performance of SIDEKIT is demonstrated on two standard evaluation tasks, namely the RSR2015 and NIST-SRE 2010.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.