2020
DOI: 10.48550/arxiv.2005.07817
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Weakly Supervised Training of Hierarchical Attention Networks for Speaker Identification

Abstract: Identifying multiple speakers without knowing where a speaker's voice is in a recording is a challenging task. In this paper, a hierarchical attention network is proposed to solve a weakly labelled speaker identification problem. The use of a hierarchical structure, consisting of a frame-level encoder and a segment-level encoder, aims to learn speaker related information locally and globally. Speech streams are segmented into fragments. The frame-level encoder with attention learns features and highlights the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
9
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(9 citation statements)
references
References 25 publications
0
9
0
Order By: Relevance
“…The final speaker identities are the output vector which contains the scores (between 1 and 0) for each speaker. The model is trained using binary cross entropy loss (shown in Eq 1) [7,6]:…”
Section: Model Architecturementioning
confidence: 99%
See 4 more Smart Citations
“…The final speaker identities are the output vector which contains the scores (between 1 and 0) for each speaker. The model is trained using binary cross entropy loss (shown in Eq 1) [7,6]:…”
Section: Model Architecturementioning
confidence: 99%
“…For this work, as there is no data for weakly supervised speaker identification task, and to compare with our previous published results, the data construction process is the same as that in our previous work [6].…”
Section: Datamentioning
confidence: 99%
See 3 more Smart Citations