Unsupervised learning of acoustic sub-word units

Varadarajan, Balakrishnan; Khudanpur, Sanjeev; Dupoux, Emmanuel

doi:10.3115/1557690.1557736

Cited by 70 publications

(50 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Speaker independence remains a major stumbling block [1] and improving it can be tackled in any of these three components. Given limited success of core recognition architectures in the zero resource setting, several alternative acoustic frontends and unsupervised acoustic models have been proposed in recent years [2,3,4,5,1,6,7,8,9,10], though there has been limited effort to evaluate these methods in a systematic way. Lexical discovery is the process of automatically identifying meaningful word-sized units from speech.…”

Section: Introductionmentioning

confidence: 99%

A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition

Jansen¹,

Dupoux

Goldwater

et al. 2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Self Cite

112

View full text Add to dashboard Cite

We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding zero resource (unsupervised) speech technologies and related models of early language acquisition. Centered around the tasks of phonetic and lexical discovery, we consider unified evaluation metrics, present two new approaches for improving speaker independence in the absence of supervision, and evaluate the application of Bayesian word segmentation algorithms to automatic subword unit tokenizations. Finally, we present two strategies for integrating zero resource techniques into supervised settings, demonstrating the potential of unsupervised methods to improve mainstream technologies.

show abstract

Section: Introductionmentioning

confidence: 99%

A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition

Jansen¹,

Dupoux

Goldwater

et al. 2013

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

Self Cite

112

View full text Add to dashboard Cite

show abstract

“…In these approaches, the transcription into labeled phones, syllables or words assumes a prior definition of these categories-even if the "semi-supervision" is only used to initialize a model that is then refined in an unsupervised fashion (cf. Ljolje et al 1997;Toledano et al 2003;Varadarajan et al 2008). …”

Section: Semi-supervised Approachesmentioning

confidence: 99%

A Computational Model of Unsupervised Speech Segmentation for Correspondence Learning

Duran

Schütze

Möbius

et al. 2010

Res on Lang and Comput

View full text Add to dashboard Cite

In this paper, we develop a new conceptual framework for an important problem in language acquisition, the correspondence problem: the fact that a given utterance has different manifestations in the speech and articulation of different speakers and that the correspondence of these manifestations is difficult to learn. We put forward the Correspondence-by-Segmentation Hypothesis, which states that correspondence is primarily learned by first segmenting speech in an unsupervised manner and then mapping the acoustics of different speakers onto each other. We show that a rudimentary segmentation of speech can be learned in an unsupervised fashion. We then demonstrate that, using the previously learned segmentation, different instances of a word can be mapped onto each other with high accuracy when trained on utterance-label pairs for a small set of words.

show abstract

“…We apply one such algorithm by Varadarajan et al [11], called the modified successive state splitting (SSS) algorithm, to our problem. We begin with a single-state HMM for each surgeme, and iteratively estimate the HMM parameters and increment the number of HMM states via SSS .…”

Section: Data-derived Hmm Topologiesmentioning

confidence: 99%

Data-Derived Models for Segmentation with Application to Surgical Assessment and Training

Varadarajan¹,

Reiley

Lin

et al. 2009

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Abstract. This paper addresses automatic skill assessment in robotic minimally invasive surgery. Hidden Markov models (HMMs) are developed for individual surgical gestures (or surgemes) that comprise a typical bench-top surgical training task. It is known that such HMMs can be used to recognize and segment surgemes in previously unseen trials [1]. Here, the topology of each surgeme HMM is designed in a data-driven manner, mixing trials from multiple surgeons with varying skill levels, resulting in HMM states that model skill-specific sub-gestures. The sequence of HMM states visited while performing a surgeme are therefore indicative of the surgeon's skill level. This expectation is confirmed by the average edit distance between the state-level "transcripts" of the same surgeme performed by two surgeons with different expertise levels. Some surgemes are further shown to be more indicative of skill than others.

show abstract

Unsupervised learning of acoustic sub-word units

Cited by 70 publications

References 4 publications

A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition

A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition

A Computational Model of Unsupervised Speech Segmentation for Correspondence Learning

Data-Derived Models for Segmentation with Application to Surgical Assessment and Training

Contact Info

Product

Resources

About