Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2417
|View full text |Cite
|
Sign up to set email alerts
|

Phonetically-Aware Embeddings, Wide Residual Networks with Time-Delay Neural Networks and Self Attention Models for the 2018 NIST Speaker Recognition Evaluation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
2

Relationship

3
2

Authors

Journals

citations
Cited by 6 publications
(9 citation statements)
references
References 13 publications
0
9
0
Order By: Relevance
“…Moreover, this architecture needs positional information [7] for the self attention layers to provide a good performance. Instead of using temporal positional information as many language modelling applications, we use the output of a phonetic classifier bottleneck [10,14]. The architecture of the phonetic classifier is an evolution of [10].…”
Section: Meanmentioning
confidence: 99%
See 3 more Smart Citations
“…Moreover, this architecture needs positional information [7] for the self attention layers to provide a good performance. Instead of using temporal positional information as many language modelling applications, we use the output of a phonetic classifier bottleneck [10,14]. The architecture of the phonetic classifier is an evolution of [10].…”
Section: Meanmentioning
confidence: 99%
“…Instead of using temporal positional information as many language modelling applications, we use the output of a phonetic classifier bottleneck [10,14]. The architecture of the phonetic classifier is an evolution of [10]. In this system, we use a modification of efficient net [17] to operate with 1D group convolutions as backbone.…”
Section: Meanmentioning
confidence: 99%
See 2 more Smart Citations
“…From each original utterance we obtain phoneme labels, one per input feature vector. These phoneme labels in this experiment were obtained by automatic means, that is, a DNN phoneme classifier [35] consisting of a Wide Residual Network [36] with four blocks. Only 39 phoneme labels were considered, that is, each phoneme label includes all its associated coarticulation.…”
Section: Reduction Of the Mismatch In α: Phonetic Balancementioning
confidence: 99%