Interspeech 2020 2020
DOI: 10.21437/interspeech.2020-1306
|View full text |Cite
|
Sign up to set email alerts
|

Phonetically-Aware Coupled Network For Short Duration Text-Independent Speaker Verification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

4
2

Authors

Journals

citations
Cited by 8 publications
(9 citation statements)
references
References 0 publications
0
9
0
Order By: Relevance
“…TDNN, also known as dilated one-dimensional convolution (Dilated 1D Conv), has been widely used in speaker verification tasks [1,4,5,6]. In [1], researchers propose a TDNN-based model for extracting speaker embeddings.…”
Section: Speaker Embeddingmentioning
confidence: 99%
See 1 more Smart Citation
“…TDNN, also known as dilated one-dimensional convolution (Dilated 1D Conv), has been widely used in speaker verification tasks [1,4,5,6]. In [1], researchers propose a TDNN-based model for extracting speaker embeddings.…”
Section: Speaker Embeddingmentioning
confidence: 99%
“…In the past few years, deep learning-based speaker embedding has made significant progress. Speaker verification systems utilizing deep learning-based speaker embedding have shown state-of-the-art performance [1,2,3,4,5,6].…”
Section: Introductionmentioning
confidence: 99%
“…Phonetically-aware coupled network(PacNet) as described in [16] is implemented. PacNet focuses on normalizing the effects of phonetic contents by directly comparing both the phonetic and acoustic information of the two utterances.…”
Section: Phonetically-aware Coupled Networkmentioning
confidence: 99%
“…In [17], phonetic features are projected into multi-channel feature maps. Along similar direction, PacNet [18] uses coupled stem to jointly learn acoustic features and transform framelevel ASR bottleneck feature. In [25], a speaker-utterance dual attention (SUDA) is applied to learn the interaction between speaker and utterance information streams in a unified framework for text-dependent SV.…”
Section: Introductionmentioning
confidence: 99%