A Minimum Boundary Error Framework for Automatic Phonetic Segmentation

Kuo, Jen-Wei; Wang, Hsin‐Min

doi:10.1007/11939993_43

Cited by 5 publications

(4 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ASR systems are extensively used for the initial segmentation of speech. A HMM based phonetic recognizer is commonly employed for phoneme segmentation and for estimating the phoneme boundaries by means of Viterbi forced-alignment [5], [6].…”

Section: Introductionmentioning

confidence: 99%

Comparison of forced-alignment speech recognition and humans for generating reference VAD

Kraljevski¹,

Bissiri

2015

Interspeech 2015

View full text Add to dashboard Cite

This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels.

show abstract

Section: Introductionmentioning

confidence: 99%

Comparison of forced-alignment speech recognition and humans for generating reference VAD

Kraljevski¹,

Bissiri

2015

Interspeech 2015

View full text Add to dashboard Cite

show abstract

“…The state of the art automatic segmentation systems are mainly HMM-based [1,2]. Boundaries produced by HMMs are subject to bias errors due to a range of factors, mainly, the training algorithms and the minimum duration of HMMs.…”

Section: Introductionmentioning

confidence: 99%

Framework for cross-language automatic phonetic segmentation

Ogbureke

Carson-Berndsen

2010

2010 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

Annotation of large multilingual corpora remains a challenge to the data-driven approach to speech research, especially for under-resourced languages. This paper presents crosslanguage automatic phonetic segmentation using Hidden Markov Models (HMMs). The underlying notion is segmentation based on articulation (manner and place) so as to provide extensive models that will be applicable across languages. A test on the Appen Spanish speech corpus gives phone recognition accuracy of 61.15% when bootstrapped with acoustic models trained on the TIMIT as compared with a baseline result of 54.63% for flat start initialization of the monophone models.

show abstract

“…An approach inspired in the minimum phone error training algorithm for automatic speech recognition [9] is presented in [10]. The objective of this approach is to minimize the expected boundary errors over a set of phonetic alignments represented as a phonetic lattice.…”

Section: Introductionmentioning

confidence: 99%

Improvements on Automatic Speech Segmentation at the Phonetic Level

Gómez

Calvo

2011

Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications

View full text Add to dashboard Cite

In this paper, we present some recent improvements in our automatic speech segmentation system, which only needs the speech signal and the phonetic sequence of each sentence of a corpus to be trained. It estimates a GMM by using all the sentences of the training subcorpus, where each Gaussian distribution represents an acoustic class, which probability densities are combined with a set of conditional probabilities in order to estimate the probability densities of the states of each phonetic unit. The initial values of the conditional probabilities are obtained by using a segmentation of each sentence assigning the same number of frames to each phonetic unit. A DTW algorithm fixes the phonetic boundaries using the known phonetic sequence. This DTW is a step inside an iterative process which aims to segment the corpus and re-estimate the conditional probabilities. The results presented here demonstrate that the system has a good capacity to learn how to identify the phonetic boundaries.

show abstract

A Minimum Boundary Error Framework for Automatic Phonetic Segmentation

Cited by 5 publications

References 8 publications

Comparison of forced-alignment speech recognition and humans for generating reference VAD

Comparison of forced-alignment speech recognition and humans for generating reference VAD

Framework for cross-language automatic phonetic segmentation

Improvements on Automatic Speech Segmentation at the Phonetic Level

Contact Info

Product

Resources

About