2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings
DOI: 10.1109/icassp.2006.1660195
|View full text |Cite
|
Sign up to set email alerts
|

Who Really Spoke When? Finding Speaker Turns and Identities in Broadcast News Audio

Abstract: Automatic speaker segmentation and clustering methods have improved considerably over the last few years in the Broadcast News domain. However, these generally still produce locally consistent relative labels (such as spkr1, spkr2) rather than true speaker identities (such as Bill Clinton, Ted Koppel). This paper presents a system which attempts to find these true identities from the text transcription of the audio using lexical pattern matching, and shows the effect on performance when using state-of-the-art … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
45
0
1

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 30 publications
(46 citation statements)
references
References 4 publications
0
45
0
1
Order By: Relevance
“…Blocking rules are used to stop rules firing in certain contexts, for example, the sequence "[name] reports " assigns the next speaker to be [name] unless is the word "that." An extension of this system described in [47], learns many rules and their associated probability of being correct automatically from the training data and then applies these simultaneously on the test data using probabilistic combination. Using automatic transcriptions and automatically found speaker turns naturally degrades performance but potentially 85% of the time can be correctly assigned to the true speaker identity using this method.…”
Section: H Finding Identitiesmentioning
confidence: 99%
“…Blocking rules are used to stop rules firing in certain contexts, for example, the sequence "[name] reports " assigns the next speaker to be [name] unless is the word "that." An extension of this system described in [47], learns many rules and their associated probability of being correct automatically from the training data and then applies these simultaneously on the test data using probabilistic combination. Using automatic transcriptions and automatically found speaker turns naturally degrades performance but potentially 85% of the time can be correctly assigned to the true speaker identity using this method.…”
Section: H Finding Identitiesmentioning
confidence: 99%
“…Note that a parallel approach for lexical SID in TV shows is to use lexical context around spoken names to classify the names between speaker, addressee and object [8,29,24]. On the contrary, this work does not depend on spoken names (hence neither on a Named Entity Recognizer), but rather analyzes the general lexical content of speech.…”
Section: Introductionmentioning
confidence: 99%
“…The use of biometric models for speaker identification appears in [5,14]. However, these audio-only approaches did not achieve good performance because of high errors rates, caused by poor speech transcriptions and bad named-entity detections.…”
Section: Introductionmentioning
confidence: 99%