The use of articulator motion information in automatic speech segmentation

Akdemir, Eren; Çiloğlu, Tolga

doi:10.1016/j.specom.2008.04.005

Cited by 13 publications

(7 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although these results obtained in this paper are very encouraging, future work is required (1) to improve the recognition accuracy for shorter delay values with minimized number of user-define parameters, (2) to extend recognition and test the approach using larger datasets of more vowels, consonants, words, and even sentences, and (3) to automatically segment training data [7], [28], which is necessary when larger datasets are available in the future. …”

Section: Discussion and Future Workmentioning

confidence: 99%

“…Most published work in this domain has used only lip or facial data, so-called visual speech recognition, or automatic lip reading [8], because recording tongue motion is logistically difficult. The lip and facial data are also commonly used as an extra input source for acoustic speech recognition in so-called articulatory speech recognition [9] or audio-visual speech recognition [7], [8]. However, the tongue is a very important articulator, particularly, for vowels.…”

Section: Introductionmentioning

confidence: 99%

“…A final major challenge for recognition of continuous speech is to identify individual speech segments (e.g., vowels) in the continuous articulatory movements (i.e., the segmentation problem [7]). Most of the previous work for vowel recognition has focused on the recognition from pre- segmented data when the onset and offset of vowels are known.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Vowel recognition from continuous articulatory movements for speaker-dependent applications

Wang

Green

Samal

et al. 2010

2010 4th International Conference on Signal Processing and Communication Systems

View full text Add to dashboard Cite

Abstract-A novel approach was developed to recognize vowels from continuous tongue and lip movements. Vowels were classified based on movement patterns (rather than on derived articulatory features, e.g., lip opening) using a machine learning approach. Recognition accuracy on a single-speaker dataset was 94.02% with a very short latency. Recognition accuracy was better for high vowels than for low vowels. This finding parallels previous empirical findings on tongue movements during vowels. The recognition algorithm was then used to drive an articulation-to-acoustics synthesizer. The synthesizer recognizes vowels from continuous input stream of tongue and lip movements and plays the corresponding sound samples in near real-time.

show abstract

Section: Discussion and Future Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Vowel recognition from continuous articulatory movements for speaker-dependent applications

Wang

Green

Samal

et al. 2010

2010 4th International Conference on Signal Processing and Communication Systems

View full text Add to dashboard Cite

show abstract

“…In this work, the HMM based AS system proposed in a previous study [10] was used as the first stage of segmentation. The first stage system uses the publicly available MOCHA-TIMIT database [7].…”

Section: S1mentioning

confidence: 99%

“…Results: The proposed HMM boundary refinement method was tested on the MOCHA-TIMIT database using a HMM based AS system as the first stage, [10]. At first, the method was tested on only two phoneme-to-phoneme boundaries; /y/-/uu/ boundary and /t/-(/uu/ or /o/) boundary; these boundaries have 125 and 50 occurrences in the database, respectively.…”

Section: S1mentioning

confidence: 99%

HMM topology for boundary refinement in automatic speech segmentation

Akdemir

Çiloğlu

2010

Electron. Lett.

View full text Add to dashboard Cite

A boundary refinement method using a new hidden Markov model (HMM) topology is proposed for automatic phonetic speech segmentation. The proposed method has the ability to work at high frame rates and the training and boundary refinement stages are easy and fast. The method is data driven and can be adapted to any speech segmentation problem provided that a training set is available. Given an initial segmentation obtained by forced alignment using an HMM based phone recogniser, 20% decrease in boundary errors is achieved.Introduction: Boundary refinement aims to improve precision in phonetic boundary locations of a speech waveform by using the boundary locations estimated by an automatic speech segmentation (AS) system, and acoustical and statistical knowledge about speech. Hidden Markov model (HMM) based speech recognisers are used for AS. They work at frame rates of 100 frame/s, which is a relatively lower value for the required segmentation accuracy (200 to 1000 frame/s). This is also the case with AS systems other than HMM based AS systems. Therefore, two-stage approaches are widely used in the literature. The boundaries obtained after the first stage have very few gross errors and many fine errors owing to poor time resolution. The refinement process has to decrease the magnitudes of the small errors without giving rise to additional large errors.Several approaches to boundary refinement exist in the literature; in [1], average deviations from the hand labelled boundaries are calculated for different boundary classes and the boundaries from the first stage are shifted by boundary specific average deviation. A context dependent approach [2] uses boundary models composed of a fixed length sequence of Gaussian mixture models (GMMs) for every phoneme pair. Ultimately, the boundary is found around the boundary point estimated in the first stage so as to maximise its likelihood given the model. Another method aims to minimise audible signal discontinuities caused by spectral mismatches when concatenating these units [3]. The weighted spectral slope metric, [4], is adapted to find the boundary as the point at which the spectral discontinuity is maximum. The search interval for the maximisation is determined according to the boundary class. A more comprehensive work [5] involves building an artificial neural network (ANN) boundary model for the second stage, which uses statistical information such as average durations of the phones in the database and the probability distribution function of the boundary around the boundary found at the first stage and also acoustic features such as energy, correlation and the log-energy spectrum of the signal. In this Letter, a boundary refinement method based on a new HMM topology is presented. Preliminary work tested on only two phoneme-tophoneme boundaries was presented (in Turkish) at a local conference [6]. The work described here involves improved training and test stages, a new boundary-phoneme-class based approach to apply the boundary refinement to all phoneme couples, and ...

show abstract

Insulin Chart Prediction for Diabetic Patients Using Hidden Markov Model (HMM) and Simulated Annealing Method

Nath

Jain

2014

Advances in Intelligent Systems and Computing

View full text Add to dashboard Cite

The use of articulator motion information in automatic speech segmentation

Cited by 13 publications

References 21 publications

Vowel recognition from continuous articulatory movements for speaker-dependent applications

Vowel recognition from continuous articulatory movements for speaker-dependent applications

HMM topology for boundary refinement in automatic speech segmentation

Insulin Chart Prediction for Diabetic Patients Using Hidden Markov Model (HMM) and Simulated Annealing Method

Contact Info

Product

Resources

About