2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) 2019
DOI: 10.1109/apsipaasc47483.2019.9023352
|View full text |Cite
|
Sign up to set email alerts
|

Phonetic-Attention Scoring for Deep Speaker Features in Speaker Verification

Abstract: Recent studies have shown that frame-level deep speaker features can be derived from a deep neural network with the training target set to discriminate speakers by a short speech segment. By pooling the frame-level features, utterance-level representations, called d-vectors, can be derived and used in the automatic speaker verification (ASV) task. This simple average pooling, however, is inherently sensitive to the phonetic content of the utterance. An interesting idea borrowed from machine translation is the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 30 publications
0
1
0
Order By: Relevance
“…A similar approach has been successfully proposed for CNNs (e.g. in [17,18,19]). Also note that our method shares similarities with i-vector methods that employ ASR to estimate the assignment of frames to speech recognition units, such as senones (e.g.…”
Section: Multi-head Factorized Attentive Poolingmentioning
confidence: 99%
“…A similar approach has been successfully proposed for CNNs (e.g. in [17,18,19]). Also note that our method shares similarities with i-vector methods that employ ASR to estimate the assignment of frames to speech recognition units, such as senones (e.g.…”
Section: Multi-head Factorized Attentive Poolingmentioning
confidence: 99%