A Simple But Effective Approach to Speaker Tracking in Broadcast News

Rodríguez, Luis Javier; Peñagarikano, Mikel; Bordel, Germán

doi:10.1007/978-3-540-72849-8_7

Cited by 6 publications

(4 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The task and evaluation measures were cast in a detection and retrieval framework, respectively, and can be compared to a known item retrieval task in text retrieval. Most speaker tracking systems solve the task by performing speaker diarization followed by speaker detection [9], [10]. Although this problem is similar to our approach for large scale diarization, there are two important differences.…”

Section: Related Workmentioning

confidence: 99%

Large-Scale Speaker Diarization for Long Recordings and Small Collections

Huijbregts

Leeuwen

2012

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Performing speaker diarization of very long recordings is a problem for most diarization systems that are based on agglomerative clustering with an HMM topology. Performing collectionwide speaker diarization, where each speaker is identified uniquely across the entire collection, is even a more challenging task. In this paper we propose a method with which it is possible to efficiently perform diarization of long recordings. We have also applied this method successfully to a collection of a total duration of approximately 15 hours. The method consists of first segmenting long recordings into smaller chunks on which diarization is performed. Next, a speaker detection system is used to link the speech clusters from each chunk and to assign a unique label to each speaker in the long recording or in the small collection. We show for three different audio collections that it is possible to perform high quality diarization with this approach. The long meetings from the ICSI corpus are processed 5.5 times faster than the originally needed time and by uniquely labeling each speaker across the entire collection it becomes possible to perform speaker-based information retrieval with high accuracy (mean average precision of 0.57).

show abstract

Section: Related Workmentioning

confidence: 99%

Large-Scale Speaker Diarization for Long Recordings and Small Collections

Huijbregts

Leeuwen

2012

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…In speaker tracking, the task is to find spoken segments of a particular speaker for which some training material is given. Most speaker tracking systems solve the task by performing speaker segmentation followed by speaker detection [5,6]. Because only a selection of a-priorly known people are tracked, labeling clusters with corresponding names is straightforward.…”

Section: Related Workmentioning

confidence: 99%

Diarization-based speaker retrieval for broadcast television archives

Huijbregts

Leeuwen

2011

Interspeech 2011

View full text Add to dashboard Cite

In this study we extend a query-by-example diarizationbased speaker retrieval system to a full speaker retrieval system for broadcast television. The envisioned system is capable of finding all speakers in an archive using their names instead of example speech fragments. Information extracted from a television guide is used to label speaker clusters that most likely correspond to the found names. As part of the labeling process, all speaker clusters are first classified automatically based on their role in the programs they appear in. The role classification accuracy is 64% on our evaluation set. Speaker names can automatically be attributed to a fraction of the speaker clusters with an accuracy of 70%.

show abstract

“…This involves applying a threshold τ and forcing a minimum segment size δ. In practice, a boundary t is validated when its cross-likelihood ratio exceeds τ and there is no candidate boundary with greater ratio in the interval [t-δ,t+δ] (see [13] for details).…”

Section: Audio Segmentationmentioning

confidence: 99%

Low-latency online speaker tracking on the AMI Corpus of meeting conversations

Zamalloa

Rodríguez-Fuentes

Bordel

et al. 2010

2010 IEEE International Conference on Acoustics, Speech and Signal Processing

Self Cite

View full text Add to dashboard Cite

Ambient Inteligence aims to create smart spaces providing services in a transparent and non-intrusive fashion, so context awareness and user adaptation are key issues. Speech can be exploited for user adaptation in such scenarios by continuously tracking speaker identity. However, most speaker tracking approaches require processing the full audio recording before determining speaker turns, which makes them unsuitable for online processing and low-latency decision-making. In this work a low-latency speaker tracking system is presented, which deals with continuous audio streams and outputs decisions at one-second intervals, by scoring fixed-length audio segments with a set of target speaker models. A smoothing technique is explored, based on the scores of past segments, which increases the robustness of tracking decisions to local variability. Experimental results are reported on the AMI Corpus of meeting conversations, revealing the effectiveness of the proposed approach when compared to an offline speaker tracking approach developed for reference.

show abstract

A Simple But Effective Approach to Speaker Tracking in Broadcast News

Cited by 6 publications

References 8 publications

Large-Scale Speaker Diarization for Long Recordings and Small Collections

Large-Scale Speaker Diarization for Long Recordings and Small Collections

Diarization-based speaker retrieval for broadcast television archives

Low-latency online speaker tracking on the AMI Corpus of meeting conversations

Contact Info

Product

Resources

About