2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016
DOI: 10.1109/icassp.2016.7472746
|View full text |Cite
|
Sign up to set email alerts
|

Investigation of speaker embeddings for cross-show speaker diarization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0
1

Year Published

2021
2021
2021
2021

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 15 publications
0
1
0
1
Order By: Relevance
“…Although the use of a pre-trained sequence extractor clearly helps in building a character-oriented voice representation, it could also create biases in the system. If certain works [22,23,27] have already been interested in the information encoded by the embeddings, it seems however legitimate to wonder if this pretraining does not guide the p-vectors too strongly towards speaker information. This article is dedicated to this question and aims to verify the two following hypotheses:…”
Section: Introductionmentioning
confidence: 99%
“…Although the use of a pre-trained sequence extractor clearly helps in building a character-oriented voice representation, it could also create biases in the system. If certain works [22,23,27] have already been interested in the information encoded by the embeddings, it seems however legitimate to wonder if this pretraining does not guide the p-vectors too strongly towards speaker information. This article is dedicated to this question and aims to verify the two following hypotheses:…”
Section: Introductionmentioning
confidence: 99%
“…Οι καμπύλες που αντιστοιχούν στο επιλεγμένο σύνολο παραμέτρων (w F = 50, t k = 2, ρ = 4) επισημαίνονται μεσώ γραμμών μεγαλύτερου πλάτους.Τα αποτελέσματα της πολυτροπικής αξιολόγησης παρουσιάζονται στον Πίνακα 6.5 σε συνδυασμό με αποτελέσματα απο το σύστημα LIUM και τη μονοτροπική (audio-only) έκδοση του προτεινόμενου αλγορίθμου. Το LIUM είναι ένα state-of-the-art σύστημα δεικτοδότησης ομιλητών μέσω ανάλυσης ηχητικών δεδομένων που αναπτύχθηκε απο τους Rouvier et al[151] . Μονοτροπική (audio) και πολυτροπική (audiovisual) απόδοση δεικτοδότησης ομιλητών για τον προτεινόμενο αλγόριθμο και υπάρχουσες state-of-the-art υλοποιήσεις…”
unclassified