Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1047
|View full text |Cite
|
Sign up to set email alerts
|

Comparison of Syllabification Algorithms and Training Strategies for Robust Word Count Estimation across Different Languages and Recording Conditions

Abstract: Word count estimation (WCE) from audio recordings has a number of applications, including quantifying the amount of speech that language-learning infants hear in their natural environments, as captured by daylong recordings made with devices worn by infants. To be applicable in a wide range of scenarios and also low-resource domains, WCE tools should be extremely robust against varying signal conditions and require minimal access to labeled training data in the target domain. For this purpose, earlier work has… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 9 publications
(14 citation statements)
references
References 16 publications
0
14
0
Order By: Relevance
“…Nevertheless, this downside of "traditional" syllabification algorithms can potentially be tackled with machine learning approaches. By training the system using a large number of syllables from different languages (potentially from different noise conditions), a machine learning algorithm may learn a general solution for syllable counting that tolerates nonspeech audio content better than the traditional signal processing approaches (e.g., Räsänen, Seshadri, & Casillas, 2018b;. This makes syllables a potential candidate unit that could be estimated directly from speech instead of using correlated features such as speech duration or phone counts as linear correlates of syllable counts.…”
Section: Technical Considerationsmentioning
confidence: 99%
See 1 more Smart Citation
“…Nevertheless, this downside of "traditional" syllabification algorithms can potentially be tackled with machine learning approaches. By training the system using a large number of syllables from different languages (potentially from different noise conditions), a machine learning algorithm may learn a general solution for syllable counting that tolerates nonspeech audio content better than the traditional signal processing approaches (e.g., Räsänen, Seshadri, & Casillas, 2018b;. This makes syllables a potential candidate unit that could be estimated directly from speech instead of using correlated features such as speech duration or phone counts as linear correlates of syllable counts.…”
Section: Technical Considerationsmentioning
confidence: 99%
“…ALICE uses the SylNet baseline model from the original paper that is trained on approximately 10 hours of handannotated Estonian and Korean speech, but further adapting the model to the present daylong child-centered data using the standard adaptation procedure of the algorithm (see Experimental Setup for details). In our initial experiments on child-centered audio, SylNet was also compared to syllabifiers from Wang and Narayanan (2007), Räsänen, Doyle, and Frank (2018a), and Räsänen, Seshadri, and Casillas (2018b). Since SylNet systematically showed superior performance to the tested alternatives, we only report ALICE performance, with SylNet as the chosen syllablebased feature extractor.…”
Section: Syllable Count Estimationmentioning
confidence: 99%
“…However, this downside of "traditional" syllabification algorithms can potentially be tackled with machine learning approaches. By training the system using a large number of syllables from different languages (potentially from different noise conditions), a machine learning algorithm may learn a general solution for syllable counting that better tolerates non-speech audio content than the traditional signal processing approaches (e.g., Räsänen, Seshadri & Casillas, 2018;. This makes syllables a potential candidate unit that could be estimated directly from speech instead of using correlated features such as speech duration or phone counts as linear correlates of syllable counts.…”
Section: Technical Considerationsmentioning
confidence: 99%
“…Okko Rasanen, Shreyas Seshadri et. al [6] presents an Automatic Word Count Estimation (AWCE) to estimate a number of words spoken in an audio recording. But this method is used only for a words that are in an audio format.…”
Section: Related Workmentioning
confidence: 99%