Phoneme classification using Markov models

Mérialdo, Bernard; Derouault, Anne-Marie; Soudoplatoff, Serge

doi:10.1109/icassp.1986.1168555

Cited by 5 publications

(1 citation statement)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The introduction of classes for speech recognition has been already investigated; different solutions have been proposed regarding small-vocabulary speaker-independent systems (ref 2), as weil as large-vocabulary systems (ref 3,4). Classiftcation can be performed automatically or starting from knowledge rules.…”

Section: Introductionmentioning

confidence: 99%

Acoustic discrimination among words based on distance measures

D'Orta

1987

European Conference on Speech Technology

View full text Add to dashboard Cite

In the development of a large-dictionary real-time speech recognition system, an approach commonly accepted is based on a multi-stage design (ref 1). In the first stages, starting from the acoustic data produced by uttering an item (syllable, word, sentence), a fast selection of a small subset of the vocabulary is performed. In the last stage, a detailed search of the most likely item is conducted over the previously identified subset. The selection, as fast as possible, shoutd be able to include always the pronounced item; nevertheless, it must have a high resolution power, that is keep small the chosen subset.We approach the design of one of the stages by the introduction of classes of equivalence among items, sclected via the definition of an acoustical distance. Each item (a word in our case) is represented by a hidden Markov model (HMM), giving a statistical description of the relationship between words and acoustical data. We investigate two different definitions of distance between words: the first one identifies the capability of the model of a word of producing the acoustical data generated by uttering several instances of another word; the second definition is based on differences in the structure and parameters of the models of words.Starting from the obtained distance matrix, a classification method is used. It is based on a minimal spanning tree approach and allows to find the classiftcation which could keep low the number of words to be sclected for the following dctailed phase.

show abstract

Section: Introductionmentioning

confidence: 99%

Acoustic discrimination among words based on distance measures

D'Orta

1987

European Conference on Speech Technology

View full text Add to dashboard Cite

show abstract

The use of the Dempster-Shafer rule in the lexical component of a man-machine oral dialogue system

Romary

Pierrel

1989

Speech Communication

View full text Add to dashboard Cite

Lexical access to large vocabularies for speech recognition

Fissore

Laface²,

Micca³

et al. 1989

IEEE Trans. Acoust., Speech, Signal Processing

View full text Add to dashboard Cite

A large vocabulary isolated word recognition system based on the hypothesize-and-test paradigm is described. The system has been, however, devised as a word hypothesizer for a continuous speech understanding system able to answer to queries put to a geographical database. Word preselection is achieved by segmenting and classifying the input signal in terms of broad phonetic classes. Due to low redundancy of this phonetic code for lexical access, to achieve high performance, a lattice of phonetic segments is generated, rather than a single sequence of hypotheses. It can be organized as a graph, and word hypothesization is obtained by matching this graph against the models of all vocabulary words. A word model is itself a phonetic representation made in terms of a graph accounting for deletion, substitution, and insertion errors. A modified Dynamic Programming (DP) matching procedure gives an efficient solution to this graph-to-graph matching problem. Hidden Markov Models (HMM's) of subword units are used as a more detailed knowledge in the verification step. The word candidates generated by the previous step are represented as sequences of diphone-like subword units, and the Viterbi algorithm is used for evaluating their likelihood. To reduce storage and computational costs, lexical knowledge is organized in a tree structure where the initial common subsequences of word descriptions are shared, and a beam-search strategy carries on the most promising paths only. The results show that a complexity reduction of about 73 percent can be achieved by using the two pass approach with respect to the direct approach, while the recognition accuracy remains comparable.

show abstract

Phoneme classification using Markov models

Cited by 5 publications

References 5 publications

Acoustic discrimination among words based on distance measures

Acoustic discrimination among words based on distance measures

The use of the Dempster-Shafer rule in the lexical component of a man-machine oral dialogue system

Lexical access to large vocabularies for speech recognition

Contact Info

Product

Resources

About