New telephone speech corpora at CSLU

Cole, Ronald A.; Noel, Mike; Lander, Terri; Durham, Tessa

doi:10.21437/eurospeech.1995-188

Cited by 89 publications

(15 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A one-hour subset of Switchboard has also been labeled with respect to stress-accent by two individuals not involved in the phonetic annotation. These individuals also labeled two and a half hours of stress-accent material from a separate (phonetically annotated) corpus, "OGI Stories" [6], containing hundreds of telephone monologues (of ca. 60-seconds each).…”

Section: Into the Wilds (Of Spontaneous Speech)mentioning

confidence: 99%

From here to utility - melding phonetic insight with speech technology

Greenberg¹

2001

7th European Conference on Speech Communication and Technology (Eurospeech 2001)

View full text Add to dashboard Cite

An historic tension exists between science and technology with respect to spoken language. Over the coming decades this tension is likely to dissolve into a collaborative relationship melding linguistic knowledge with machine-learning and statistical methods as a means of developing mature science and technology pertaining to human-machine communication. In the process many mysteries surrounding the form and substance of spoken language are likely to be solved through the concerted efforts of scientists and engineers focused on the creation of "flawless" speech technology.

show abstract

Section: Into the Wilds (Of Spontaneous Speech)mentioning

confidence: 99%

From here to utility - melding phonetic insight with speech technology

Greenberg¹

2001

7th European Conference on Speech Communication and Technology (Eurospeech 2001)

View full text Add to dashboard Cite

show abstract

“…The ALPS transcription system was evaluated using spontaneous speech material from the Numbers95 corpus [1], collected and phonetically annotated (i.e., labeled and segmented) at the Oregon Graduate Institute. This corpus contains the numerical portion (mostly street addresses and phone numbers) of thousands of telephone dialogues and possesses a lexicon of 37 words and an inventory of 29 phonetic segments.…”

Section: Corpus Materialsmentioning

confidence: 99%

“…The architecture of the TFM networks used for classification of the articulatory acoustic features was developed using a threedimensional representation of the log-power-spectrum distributed across frequency and time that incorporates both the mean and variance of the energy distribution associated with multiple (typically, hundreds or thousands of) instances of a specific phonetic feature or segment derived from the phonetically annotated, OGI Stories-TS corpus [1]. Each phonetic-segment class was mapped to an array of articulatory phonetic features, and this map used to construct the spectrotemporal profile (STeP) for a given feature class.…”

Section: Spectro-temporal Profilesmentioning

confidence: 99%

Automatic phonetic transcription of spontaneous speech (american English)

Chang¹,

Shastri²,

Greenberg³

2000

6th International Conference on Spoken Language Processing (ICSLP 2000)

View full text Add to dashboard Cite

An automatic transcription system has been developed to label and segment phonetic constituents of spontaneous American English without benefit of a word-level transcript. Instead, special-purpose neural networks classify each 10-ms frame of speech in terms of articulatory-acoustic-based phonetic features and the feature clusters are subsequently mapped to phoneticsegment labels using multilayer perceptron networks. The phonetic labels generated by this system are 80% concordant with the labels produced by human transcribers and the segmental boundaries deviate from manual segmentation by an average of 11 ms. The automatic transcription system thus generates phonetic labels and segmentation comparable in quality to those produced by human transcribers, and therefore may prove useful for phonetic annotation of novel linguistic corpora, as well as facilitating development of pronunciation models for automatic speech recognition systems.

show abstract

“…It is more meaningful to test the three phone types on a realworld recognition task. Four thousand phonetically transcribed names are selected from the OGI Names Corpus [2] with balanced genders. One hundred test sets of perplexity 40 are constructed by randomly choosing ten male speaking names and ten female speaking names 100 times without replacement.…”

Section: Isolated Word Recognitionmentioning

confidence: 99%

“…A. Ljolje [8] used more detailed contextual eects to derive a set of 19 left-context classes and 18 right-context classes. (2) The data-driven approach e v aluates all contexts in the training data, and uses some distance measure with a clustering algorithm to split or merge the contexts to a specied number of generalized contexts. This usually uses an information-theoretic distance measure commonly employed with Hidden Markov models.…”

Section: Introductionmentioning

confidence: 99%

Phone clustering using the bhattacharyya distance

Mak¹,

Barnard²

1996

4th International Conference on Spoken Language Processing (ICSLP 1996)

View full text Add to dashboard Cite

In this paper we study using the classication-based Bhattacharyya distance measure to guide biphone clustering. The Bhattacharyya distance is a theoretical distance measure between two Gaussian distributions which is equivalent t o a n upper bound on the optimal Bayesian classication error probability. It also has the desirable properties of being computationally simple and extensible to more Gaussian mixtures. Using the Bhattacharyya distance measure in a datadriven approach together with a novel 2-Level Agglomerative Hierarchical Biphone Clustering algorithm, generalized left/right biphones (BGBs) are derived. A neural-net based phone recognizer trained on the BGBs is found to have better frame-level phone recognition than one trained on generalized biphones (BCGBs) derived from a set of commonlyused broad categories. We further evaluate the new BGBs on an isolated-word recognition task of perplexity 40 and obtain a 16.2% error reduction over the broad-category generalized biphones (BCGBs) and a 41.8% error reduction over the monophones.

show abstract

New telephone speech corpora at CSLU

Cited by 89 publications

References 0 publications

From here to utility - melding phonetic insight with speech technology

From here to utility - melding phonetic insight with speech technology

Automatic phonetic transcription of spontaneous speech (american English)

Phone clustering using the bhattacharyya distance

Contact Info

Product

Resources

About