Speaker-independent recognition of spoken English letters

Cole, Ronald A.; Fanty, Mark; Muthusamy, Yeshwant K.; Gopalakrishnan, Murali

doi:10.1109/ijcnn.1990.137693

Cited by 33 publications

(16 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recognition accuracy of test data obtained from this experiment was 97.4%. This is favorable to the result reported in [2]. Note, however, that the result in this experiment is slightly lower than that of the speaker independent case.…”

Section: Experiments IVcontrasting

confidence: 40%

Signal modeling for isolated word recognition

Karnjanadecha

Zahorian

1999

1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258)

View full text Add to dashboard Cite

This paper presents speech signal modeling techniques which are well suited to high performance and robust isolated word recognition. Speech is encoded by a discrete cosine transform of its spectra, after several preprocessing steps. Temporal information is then also explicitly encoded into the feature set. We present a new technique for incorporating this temporal information as a function of temporal position within each word. We tested features computed with this method using an alphabet recognition task based on the ISOLET database. The HTK toolkit was used to implement the isolated word recognizer with whole word HMM models. The best result obtained based on 50 features and speaker independent alphabet recognition was 98.0%. Gaussian noise was added to the original speech to simulate a noisy environment. We achieved a recognition accuracy of 95.8% at a SNR of 15 dB. We also tested our recognizer with simulated telephone quality speech by adding noise and band limiting the original speech. For this "telephone" speech, our recognizer achieved 89.6% recognition accuracy. The recognizer was also tested in a speaker dependent mode, resulting in 97.4% accuracy on test data. INTRODUCTIONContinuous speech recognition systems have been developed for many real-world applications, often using commercial low-cost speech recognition software. However, high performance and robust isolated word recognition, particularly for the letters of the alphabet recognizer and for digits, is still useful for many applications such as recognizing telephone numbers, spelled names and address, and ZIP codes.Because of the potential applications, as mentioned above, many isolated word recognizers are optimized for the digits or alphabet or both (alphadigit). The alphabet recognition task is particularly difficult because there are many highly confusable letters in the alphabet set---for example the great acoustic similarity among the letters of the E-set (b, c, d, e, g, p, t, v, z) or for the (m,n) pair. Also, since language models cannot generally be used, the alphabet recognition task is a small, challenging, and potentially useful problem for evaluating acoustic signal modeling and word recognition methodsSeveral techniques have been proposed to improve isolated word recognition systems. For example, the best result in a speaker independent alphabet recognition was obtained using a multi-tier phoneme-based Hidden Markov Model (HMM) recognizer [5]. Disadvantages of phoneme-based HMM recognizers are the system complexity and the phonetic transcription of the training words has to be known.The main contribution of this paper is to present a method for isolated word recognition which is easier to implement than the state of the art systems introduced to date, and one which gives better performance than any of these previously introduced systems.The ISOLET database, [1], was used for all experiments reported in this paper. This LDC distributed database was intended for evaluation of isolated word recognizers and it has therefore ...

show abstract

Section: Experiments IVcontrasting

confidence: 40%

Signal modeling for isolated word recognition

Karnjanadecha

Zahorian

1999

1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258)

View full text Add to dashboard Cite

show abstract

“…3 for the Alphabet Letters corpus is from a neural network recognizer designed specifically for recognizing iso-Ž . lated letters Cole et al, 1990 . This recognizer was trained using one repetition of each letter from 60 Ž .…”

mentioning

confidence: 99%

Speech recognition by machines and humans

Lippmann

1997

Speech Communication

405

173

View full text Add to dashboard Cite

This paper reviews past work comparing modern speech recognition systems and humans to determine how far recent dramatic advances in technology have progressed towards the goal of human-like performance. Comparisons use six modern speech corpora with vocabularies ranging from 10 to more than 65,000 words and content ranging from read isolated words to spontaneous conversations. Error rates of machines are often more than an order of magnitude greater than those of humans for quiet, wideband, read speech. Machine performance degrades further below that of humans in noise, with channel variability, and for spontaneous speech. Humans can also recognize quiet, clearly spoken nonsense syllables and nonsense sentences with little high-level grammatical information. These comparisons suggest that the human-machine performance gap can be reduced by basic research on improving low-level acoustic-phonetic modeling, on improving robustness with noise and channel variability, and on more accurately modeling spontaneous speech. q 1997 Elsevier Science B.V. ResuméCe papier presente un bilan des travaux comparant les performances des systemes de reconnaissance de parole moderneśà celles des locuteurs humains. Les comparaisons sont basees sur six types de corpus de parole avec des vocabulaires allantd e 10 a plus de 65 000 mots et des contenus allant des mots isoles a des conversations spontanees. Les taux d'erreurs des´ḿ achines sont souvent superieures de plus d'un ordre de grandeur a celles des humains pour la parole lue en atmospheré`c alme et transmise en large-bande. Les performances des machines se degradent encore par rapport a celles des humainśd ans les contextes bruites, ou de qualite de transmission variable et pour la parole spontanee. Les locuteurs humains peuvent´é galement reconnaitre, avec peu d'information linguistique de haut-niveau, des syllabes ou des phrases sans significatioń quand elles sont prononcees clairement dans des atmospheres calmes. Ces comparaisons suggerent que l'ecart important quí``ś ubsiste entre les performances des machines et celles des humains peut etre reduit par des recherches de base sur les sujetŝś uivants: l'amelioration de la modelisation acoustico-phonetique de bas-niveau, l'amelioration de la robustesse au bruit et á´´´l a variabilite des conditions de transmission, et la modelisation plus precise de la parole spontanee. q 1997 Elsevier Sciencé´´B .V.

show abstract

“…A speaker-independent spoken English alphabet recognition system was designed by Cole et al [5]. That system was trained on one token of each letter from 120 speakers.…”

Section: Spoken Alphabets Recognitionmentioning

confidence: 99%

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

Alotaibi

Alghamdi

Alotaiby

2010

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Automatic recognition of spoken alphabets is one of the difficult tasks in the field of computer speech recognition. In this research, spoken Arabic alphabets are investigated from the speech recognition problem point of view. The system is designed to recognize spelling of an isolated word. The Hidden Markov Model Toolkit (HTK) is used to implement the isolated word recognizer with phoneme based HMM models. In the training and testing phase of this system, isolated alphabets data sets are taken from the telephony Arabic speech corpus, SAAVB. This standard corpus was developed by KACST and it is classified as a noisy speech database. A hidden Markov model based speech recognition system was designed and tested with automatic Arabic alphabets recognition. Four different experiments were conducted on these subsets, the first three trained and tested by using each individual subset, the fourth one conducted on these three subsets collectively. The recognition system achieved 64.06% overall correct alphabets recognition using mixed training and testing subsets collectively.

show abstract

Speaker-independent recognition of spoken English letters

Cited by 33 publications

References 8 publications

Signal modeling for isolated word recognition

Signal modeling for isolated word recognition

Speech recognition by machines and humans

Speech Recognition System of Arabic Alphabet Based on a Telephony Arabic Corpus

Contact Info

Product

Resources

About