Remote speaker verification services typically rely on the system having access to the users recordings, or features derived from them, and/or a model for the users voice. This conventional approach raises several privacy concerns. In this work, we address this privacy problem in the context of a speaker verification system using a factor analysis based frontend extractor, the so-called i-vectors. Preserving privacy in our context means that neither the system observes voice samples or speech models from the user, nor the user observes the universal model owned by the system. This is achieved by transforming speaker i-vectors to bit strings in a way that allows for the computation of approximate distances, instead of exact ones.The key to the transformation uses a hashing scheme known as secure binary embeddings. Then, an SVM classifier with a modified kernel operates on the hashes. Experiments showed that the secure system yielded similar results as its non-private counterpart. The approach may be extended to other types of biometric authentication.
Speaker authentication systems require access to the voice of the user. A person's voice carries information about their gender, nationality etc., all of which become accessible to the system, which could abuse this knowledge. The system also stores users' voice prints -these may be stolen and used to impersonate the users elsewhere. It is therefore important to develop privacy preserving voice authentication techniques that enable a system to authenticate users by their voice, while simultaneously obscuring the user's voice and voice patterns from the system. Prior work in this area has employed expensive cryptographic tools, or has cast authentication as a problem of exact match with compromised accuracy. In this paper we present a new technique that employs secure binary embeddings of feature vectors, to perform voice authentication in a privacy preserving manner with minimal computational overhead and little loss of classification accuracy.
This paper summarizes the contributions to semantic video search that can be derived from the audio signal. Because of space restrictions, the emphasis will be on non-linguistic cues. The paper thus covers what is generally known as audio segmentation, as well as audio event detection. Using machine learning approaches, we have built detectors for over 50 semantic audio concepts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.