The ACLEW DiViMe: An Easy-to-use Diarization Tool

Franc, Adrien Le; Riebling, Eric; Karadayi, Julien; Wang, Yun; Scaff, Camila; Metze, Florian; Cristià, Alejandrina

doi:10.21437/interspeech.2018-2324

Cited by 28 publications

(29 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Indeed, the two systems included in DiViMe performed near the bottom in a recent challenge called DiHARD, aimed at assessing diarization performance in "difficult" datasets, such as meetings and doctor-patient interviews [33]. But it is likely that the present recordings are considerably more difficult even than their "difficult" datasets, since (according to [16]) the DiViMe pipelines applied to the DiHARD standardized evaluation set led to DERs of 65-72%. The difference between these DiHARD scores in the 70's and the DERs averaging 110% obtained for the Tsimane dataset represents the additional difficulty posed by these data.…”

Section: Resultsmentioning

confidence: 99%

“…The Diarization Virtual MachinE (DiViMe for short) currently contains two tools that permit speech activity detection (i.e., detecting which portions of the recording contain some speech), and one tool for speaker diarization (i.e., attributing a speech portion to one or another speaker), which, combined, lead to two purely unsupervised pipelines yielding a segmentation of the recording into different speakers. In a recently accepted paper [16], global speech activity detection and talker diarization performance was reported.…”

Section: Diarization In Maximally Ecological Recordings: Data From Tsmentioning

confidence: 99%

See 1 more Smart Citation

Diarization in Maximally Ecological Recordings: Data from Tsimane Children

Karadayi

Scaff

Cristià

2018

6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018)

Self Cite

View full text Add to dashboard Cite

Daylong recordings may be the most naturalistic and least invasive way to collect speech data, sampling all potential language use contexts, with a device that is unobtrusive enough to have little effect on people's behaviors. As a result, this technology is relevant for studying diverse languages, including understudied languages in remote settings -provided we can apply effective unsupervised analyses procedures. In this paper, we analyze in detail results from applying an open source package (DiViMe) and a proprietary alternative (LENA TM ), onto clips periodically sampled from daylong recorders worn by Tsimane children of the Bolivian Amazon (age range: 6-68 months; recording time/child range: 4-22h). Detailed analyses showed the open source package fared no worse than the proprietary alternative. However, performance was overall rather dismal. We suggest promising directions for improvements based on analyses of variation in performance within our corpus.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Diarization In Maximally Ecological Recordings: Data From Tsmentioning

confidence: 99%

Diarization in Maximally Ecological Recordings: Data from Tsimane Children

Karadayi

Scaff

Cristià

2018

6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018)

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, now that the basic concept and its functionality have been validated, the next efforts should be directed toward the development and testing of a robust speaker diarization module required for speaker attribution. Although the current ACLEW virtual machine published in Le Franc et al (2018) already contains one such a tool, DiarTK (Vijayasenan & Valente, 2012), its performance was found to be lacking on child daylong data (see also DiHARD diarization challenge 10 where DiarTK scored at the bottom among all the submissions; see also Le Franc et al, 2018). In order to maintain focus on SAD and syllabifier comparisons, no separate experiments with diarization tools were included in the present report.…”

Section: Limitations and Future Workmentioning

confidence: 99%

Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech

Räsänen¹,

Seshadri²,

karadayi³

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

Automatic word count estimation (WCE) from audio recordings can be used to quantify the amount of verbal communication in a recording environment. One key application of WCE is to measure language input heard by infants and toddlers in their natural environments, as captured by daylong recordings from microphones worn by the infants. Although WCE is nearly trivial for high-quality signals in high-resource languages, daylong recordings are substantially more challenging due to the unconstrained acoustic environments and the presence of near- and far-field speech. Moreover, many use cases of interest involve languages for which reliable ASR systems or even well-defined lexicons are not available. A good WCE system should also perform similarly for low- and high-resource languages in order to enable unbiased comparisons across different cultures and environments. Unfortunately, the current state-of- the-art solution, the LENA system, is based on proprietary software and has only been optimized for American English, limiting its applicability. In this paper, we build on existing work on WCE and present the steps we have taken towards a freely available system for WCE that can be adapted to different languages or dialects with a limited amount of orthographically transcribed speech data. Our system is based on language-independent syllabification of speech, followed by a language-dependent mapping from syllable counts (and a number of other acoustic features) to the corresponding word count estimates. We evaluate our system on samples from daylong infant recordings from six different corpora consisting of several languages and socioeconomic environments, all manually annotated with the same protocol to allow direct comparison. We compare a number of alternative techniques for the two key components in our system: speech activity detection and automatic syllabification of speech. As a result, we show that our system can reach relatively consistent WCE accuracy across multiple corpora and languages (with some limitations). In addition, the system outperforms LENA on three of the four corpora consisting of different varieties of English. We also demonstrate how an automatic neural network-based syllabifier, when trained on multiple languages, generalizes well to novel languages beyond the training data, outperforming two previously proposed unsupervised syllabifiers as a feature extractor for WCE.

show abstract

“…Although the there is no real open-source, languagegeneral, and population-general version of LENA, there is a relatively easy-to-use, open alternative being developed: DiViMe (Le Franc et al, 2018;ACLEW/DiViMe, 2018).…”

Section: Available Alternative Systems To Lenamentioning

confidence: 99%

A step-by-step guide to collecting and analyzing long-format speech environment (LFSE) recordings

Casillas

Cristià

2019

Collabra: Psychology

Self Cite

View full text Add to dashboard Cite

Recent years have seen rapid technological development of devices that can record communicative behavior as participants go about daily life. This paper is intended as an end-to-end methodological guidebook for potential users of these technologies, including researchers who want to study children's or adults' communicative behavior in everyday contexts. We explain how long-format speech environment (LFSE) recordings provide a unique view on language use and how they can be used to complement other measures at the individual and group level. We aim to help potential users of these technologies make informed decisions regarding research design, hardware, software, and archiving. We also provide information regarding ethics and implementation, issues that are difficult to navigate for those new to this technology, and on which little or no resources are available. This guidebook offers a concise summary of information for new users and points to sources of more detailed information for more advanced users. Links to discussion groups and community-augmented databases are also provided to help readers stay up-to-date on the latest developments.

show abstract

The ACLEW DiViMe: An Easy-to-use Diarization Tool

Cited by 28 publications

References 7 publications

Diarization in Maximally Ecological Recordings: Data from Tsimane Children

Diarization in Maximally Ecological Recordings: Data from Tsimane Children

Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech

A step-by-step guide to collecting and analyzing long-format speech environment (LFSE) recordings

Contact Info

Product

Resources

About