Analysis of Phonetic Dependence of Segmentation Errors in Speaker Diarization

McKnight, Simon W.; Hogg, Aidan O. T.; Naylor, Patrick A.

doi:10.23919/eusipco47968.2020.9287552

Cited by 2 publications

(1 citation statement)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even more problematically, most labelling readily available is much less accurate, so ideally a way could be found to take advantage of the less accurate labels without penalising the scoring of more accurately labelled systems. Uniform forgiveness collars are crude attempts to mitigate this problem [17]. These exclude collars of plus and minus the collar size around the GT-labels from scoring.…”

Section: Human Reviews Analysismentioning

confidence: 99%

Studying Human-Based Speaker Diarization and Comparing to State-of-the-Art Systems

McKnight

Hogg

Neo

et al. 2022

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

View full text Add to dashboard Cite

Human-based speaker diarization experiments were carried out on a five-minute extract of a typical AMI corpus meeting to see how much variance there is in human reviews based on hearing only and to compare with state-ofthe-art diarization systems on the same extract. There are three distinct experiments: (a) one with no prior information; (b) one with the ground truth speech activity detection (GT-SAD); and (c) one with the blank ground truth labels (GT-labels). The results show that most human reviews tend to be quite similar, albeit with some outliers, but the choice of GT-labels can make a dramatic difference to scored performance. Using the GT-SAD provides a big advantage and improves human review scores substantially, though small differences in the GT-SAD used can have a dramatic effect on results. The use of forgiveness collars is shown to be unhelpful. The results show that state-of-theart systems can outperform the best human reviews when no prior information is provided. However, the best human reviews still outperform state-of-the-art systems when starting from the GT-SAD.

show abstract

Section: Human Reviews Analysismentioning

confidence: 99%

Studying Human-Based Speaker Diarization and Comparing to State-of-the-Art Systems

McKnight

Hogg

Neo

et al. 2022

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

View full text Add to dashboard Cite

show abstract

Overlapping Speaker Segmentation Using Multiple Hypothesis Tracking of Fundamental Frequency

Hogg

Evers

Moore

et al. 2021

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

This paper demonstrates how the harmonic structure of voiced speech can be exploited to segment multiple overlapping speakers in a speaker diarization task. We explore how a change in the speaker can be inferred from a change in pitch. We show that voiced harmonics can be useful in detecting when more than one speaker is talking, such as during overlapping speaker activity. A novel system is proposed to track multiple harmonics simultaneously, allowing for the determination of onsets and end-points of a speaker's utterance in the presence of an additional active speaker. This system is bench-marked against a segmentation system from the literature that employs a bidirectional long short term memory network (BLSTM) approach and requires training. Experimental results highlight that the proposed approach outperforms the BLSTM baseline approach by 12.9% in terms of HIT rate for speaker segmentation. We also show that the estimated pitch tracks of our system can be used as features to the BLSTM to achieve further improvements of 1.21% in terms of coverage and 2.45% in terms of purity.

show abstract

Analysis of Phonetic Dependence of Segmentation Errors in Speaker Diarization

Cited by 2 publications

References 14 publications

Studying Human-Based Speaker Diarization and Comparing to State-of-the-Art Systems

Studying Human-Based Speaker Diarization and Comparing to State-of-the-Art Systems

Overlapping Speaker Segmentation Using Multiple Hypothesis Tracking of Fundamental Frequency

Contact Info

Product

Resources

About