The Intelligent Voice System for the IberSPEECH-RTVE 2018 Speaker Diarization Challenge

Khosravani, Abbas; Glackin, Cornelius; Dugan, Nazım; Chollet, Gérard; Cannings, Nigel

doi:10.21437/iberspeech.2018-48

Cited by 1 publication

(1 citation statement)

References 27 publications

(37 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The differences between systems, primary and contrast, were found in the different parameters used in the voice activity detection system. • G21-EMPHATIC [28]. SPIN-Speech Interactive Research Group, Universidad del País Vasco, Spain, Intelligent Voice, U.K.…”

Section: Open-set Condition Systemsmentioning

confidence: 99%

Albayzin 2018 Evaluation: The IberSpeech-RTVE Challenge on Speech Technologies for Spanish Broadcast Media

et al. 2019

View full text Add to dashboard Cite

The IberSpeech-RTVE Challenge presented at IberSpeech 2018 is a new Albayzin evaluation series supported by the Spanish Thematic Network on Speech Technologies (Red Temática en Tecnologías del Habla (RTTH)). That series was focused on speech-to-text transcription, speaker diarization, and multimodal diarization of television programs. For this purpose, the Corporacion Radio Television Española (RTVE), the main public service broadcaster in Spain, and the RTVE Chair at the University of Zaragoza made more than 500 h of broadcast content and subtitles available for scientists. The dataset included about 20 programs of different kinds and topics produced and broadcast by RTVE between 2015 and 2018. The programs presented different challenges from the point of view of speech technologies such as: the diversity of Spanish accents, overlapping speech, spontaneous speech, acoustic variability, background noise, or specific vocabulary. This paper describes the database and the evaluation process and summarizes the results obtained. in Spanish [8][9][10][11], and more recently, the Multi-Genre Broadcast (MGB) Challenge with data in English and Arabic 2 [12][13][14]. In other areas apart from broadcast speech, several evaluation campaigns have been proposed such as the ones organized in the scope of the Zero Resource Speech Challenge [15,16], the TC-STAR evaluation on recordings of the European Parliament's sessions in English and Spanish [5], or the MediaEval evaluation of multimodal search and hyperlinking [17].As a way to measure the performance of different techniques and approaches, in this 2018 edition, the IberSpeech-RTVE Challenge Evaluation campaign was proposed in three different conditions: speech-to-text transcription (STT), speaker diarization (SD), and multimodal diarization (MD). Twenty-two teams registered to the challenge, and eighteen submitted systems in at least one of the three proposed tasks. In this paper, we describe the challenge and the data provided by the organization to the participants. We also provide a description of the systems presented to the evaluation, their results, and a set of conclusions that can be drawn from this evaluation campaign.This paper is organized as follows. In Section 2, the RTVE2018 database is presented. Section 3 describes the three evaluation tasks, speech-to-text transcription, speaker diarization, and multimodal diarization. Section 4 provides a brief description of the main features of the submitted systems. Section 5 presents results, and Section 6 gives conclusions.

show abstract

Section: Open-set Condition Systemsmentioning

confidence: 99%