2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8462358
|View full text |Cite
|
Sign up to set email alerts
|

Deep CNN Based Feature Extractor for Text-Prompted Speaker Recognition

Abstract: Deep learning is still not a very common tool in speaker verification field. We study deep convolutional neural network performance in the text-prompted speaker verification task. The prompted passphrase is segmented into word statesi.e. digits -to test each digit utterance separately. We train a single high-level feature extractor for all states and use cosine similarity metric for scoring. The key feature of our network is the Max-Feature-Map activation function, which acts as an embedded feature selector. B… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
5

Relationship

1
9

Authors

Journals

citations
Cited by 13 publications
(7 citation statements)
references
References 20 publications
0
7
0
Order By: Relevance
“…In particular, for stimuli identification and feature analysis in HMI devices, which require training with real data, supervised learning algorithms are the most popular because they ensure fast and efficient performance. For example, Lee et al focused on the integration of machine learning with flexible piezoelectric acoustic sensors, showing huge potential for speech recognition [91] . Figure 6A briefly illustrates the process of machine learning-enhanced P-HMIs and their potential applications in HMIs concerning human voices.…”
Section: Machine Learning-enhanced P-hmismentioning
confidence: 99%
“…In particular, for stimuli identification and feature analysis in HMI devices, which require training with real data, supervised learning algorithms are the most popular because they ensure fast and efficient performance. For example, Lee et al focused on the integration of machine learning with flexible piezoelectric acoustic sensors, showing huge potential for speech recognition [91] . Figure 6A briefly illustrates the process of machine learning-enhanced P-HMIs and their potential applications in HMIs concerning human voices.…”
Section: Machine Learning-enhanced P-hmismentioning
confidence: 99%
“…The idea of local perception has been widely applied in the feature-extracted full-channel and local attention channels, used as feature extraction tools to obtain text features [30]. Multi-mode information also has been adopted in the recommendation of microblogging [31], using CNN and RNN to extract features from images and texts, and then combining them to make tag recommendations.…”
Section: Deep Learning Models In Recommendationmentioning
confidence: 99%
“…Textdependent SV task allows us to compare utterances of the same phonetic context [6], [7], or random word sequences coming from a fixed vocabulary [8], [9]. With random sequences, such as random digit strings, an SV system is less vulnerable to replay attacks [8], [10], [11]. In this work, we study a neural acoustic-phonetic approach for SV of random digit strings in RSR2015 Part III database [12].…”
Section: Introductionmentioning
confidence: 99%