ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8683114
|View full text |Cite
|
Sign up to set email alerts
|

Incremental Transfer Learning in Two-pass Information Bottleneck Based Speaker Diarization System for Meetings

Abstract: The two-pass information bottleneck (TPIB) based speaker diarization system operates independently on different conversational recordings. TPIB system does not consider previously learned speaker discriminative information while diarizing new conversations. Hence, the real time factor (RTF) of TPIB system is high owing to the training time required for the artificial neural network (ANN). This paper attempts to improve the RTF of the TPIB system using an incremental transfer learning approach where the paramet… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…It is worth noting that although AMI is widely used for speaker diarisation, most studies either use different sets of meetings for testing, or audios recorded by independent headset microphones, such as (Bullock et al, 2020;Dawalatabad et al, 2019;Pal et al, 2020a), which makes such results not comparable to those presented in this paper. The official speech recognition data partition and MDM audios are used as a realistic setup for meeting transcription.…”
Section: Dataset Detailsmentioning
confidence: 83%
“…It is worth noting that although AMI is widely used for speaker diarisation, most studies either use different sets of meetings for testing, or audios recorded by independent headset microphones, such as (Bullock et al, 2020;Dawalatabad et al, 2019;Pal et al, 2020a), which makes such results not comparable to those presented in this paper. The official speech recognition data partition and MDM audios are used as a realistic setup for meeting transcription.…”
Section: Dataset Detailsmentioning
confidence: 83%
“…For finding the speaker clusters in a sequence of x-vectors, the variational Bayesian hidden Markov model (VBx) was investigated in [15,7]. For continuously learning speaker discriminative information, "Remember-Learn-Transfer" was proposed in [16]. Applying lexical and acoustic information for SD was investigated in [17].…”
Section: Related Workmentioning
confidence: 99%
“…An incremental transfer learning based approach was proposed recently by us, which uses the "Remember-Learn- Transfer" approach to continuously learn speaker discriminative information [34]. This reduces the RTF of TPIB (with MFNN) system [8] by 33% (relative) but with minor degradation in the speaker error rate.…”
Section: Approaches To Speaker Diarizationmentioning
confidence: 99%
“…The list of meeting IDs from AMI datasets recorded at Idiap (IS) and Edinburgh (ES) is given in Table I. The datasets are also used in [8] and [34] for evaluation. The number of speakers in the NIST dataset varies from 3 to 10.…”
Section: A Datasets Features and Evaluation Measurementioning
confidence: 99%