Visualization of Speech Perception Analysis via Phoneme Alignment: A Pilot Study

Ratnanather, J. Tilak; Wang, Lydia C.; Bae, Seung-Ho; O’Neill, Erin R.; Sagi, Elad; Tward, Daniel J.

doi:10.3389/fneur.2021.724800

Cited by 2 publications

(2 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, the use of ASR could open venues to improved (automated) scoring methods in audiology tests. Ratnanather et al ( 51 ) demonstrated how one can automate the alignment of phonemes based on the minimum edit distance between the source speech and the utterances of the subject in real time. Visualizing this alignment may provide insights to clinicians about what phonological errors are made.…”

Section: Discussionmentioning

confidence: 99%

Preliminary Evaluation of Automated Speech Recognition Apps for the Hearing Impaired and Deaf

Pragt¹,

Hengel²,

Grob³

et al. 2022

Front. Digit. Health

View full text Add to dashboard Cite

ObjectiveAutomated speech recognition (ASR) systems have become increasingly sophisticated, accurate, and deployable on many digital devices, including on a smartphone. This pilot study aims to examine the speech recognition performance of ASR apps using audiological speech tests. In addition, we compare ASR speech recognition performance to normal hearing and hearing impaired listeners and evaluate if standard clinical audiological tests are a meaningful and quick measure of the performance of ASR apps.MethodsFour apps have been tested on a smartphone, respectively AVA, Earfy, Live Transcribe, and Speechy. The Dutch audiological speech tests performed were speech audiometry in quiet (Dutch CNC-test), Digits-in-Noise (DIN)-test with steady-state speech-shaped noise, sentences in quiet and in averaged long-term speech-shaped spectrum noise (Plomp-test). For comparison, the app's ability to transcribe a spoken dialogue (Dutch and English) was tested.ResultsAll apps scored at least 50% phonemes correct on the Dutch CNC-test for a conversational speech intensity level (65 dB SPL) and achieved 90–100% phoneme recognition at higher intensity levels. On the DIN-test, AVA and Live Transcribe had the lowest (best) signal-to-noise ratio +8 dB. The lowest signal-to-noise measured with the Plomp-test was +8 to 9 dB for Earfy (Android) and Live Transcribe (Android). Overall, the word error rate for the dialogue in English (19–34%) was lower (better) than for the Dutch dialogue (25–66%).ConclusionThe performance of the apps was limited on audiological tests that provide little linguistic context or use low signal to noise levels. For Dutch audiological speech tests in quiet, ASR apps performed similarly to a person with a moderate hearing loss. In noise, the ASR apps performed more poorly than most profoundly deaf people using a hearing aid or cochlear implant. Adding new performance metrics including the semantic difference as a function of SNR and reverberation time could help to monitor and further improve ASR performance.

show abstract

Section: Discussionmentioning

confidence: 99%

Preliminary Evaluation of Automated Speech Recognition Apps for the Hearing Impaired and Deaf

Pragt¹,

Hengel²,

Grob³

et al. 2022

Front. Digit. Health

View full text Add to dashboard Cite

show abstract

“…Responses to open-set sentences, especially when presented in background noise, include insertions (i.e., reporting words or phonemes that were not presented, or false starts such as "um") and deletions (i.e., not reporting words or phonemes that were presented), which makes it extremely difficult to create a one-to-one mapping of response phonemes to stimulus phonemes that is necessary for analyzing consonant feature errors. Automatic phoneme alignment algorithms have been developed for open-set responses to sentence-length stimuli (Bernstein et al, 1994(Bernstein et al, , 2021Ratnanather et al, 2022) in order to generate consonant confusion matrices (CMs) for sentence stimuli. However, consonant feature analysis based on such CMs is confounded by the context information in meaningful words and sentences.…”

mentioning

confidence: 99%

Consonant Perception in Connected Syllables Spoken at a Conversational Syllabic Rate

2023

View full text Add to dashboard Cite

Closed-set consonant identification, measured using nonsense syllables, has been commonly used to investigate the encoding of speech cues in the human auditory system. Such tasks also evaluate the robustness of speech cues to masking from background noise and their impact on auditory-visual speech integration. However, extending the results of these studies to everyday speech communication has been a major challenge due to acoustic, phonological, lexical, contextual, and visual speech cue differences between consonants in isolated syllables and in conversational speech. In an attempt to isolate and address some of these differences, recognition of consonants spoken in multisyllabic nonsense phrases (e.g., aBaSHaGa spoken as /ɑbɑʃɑɡɑ/) produced at an approximately conversational syllabic rate was measured and compared with consonant recognition using Vowel-Consonant-Vowel bisyllables spoken in isolation. After accounting for differences in stimulus audibility using the Speech Intelligibility Index, consonants spoken in sequence at a conversational syllabic rate were found to be more difficult to recognize than those produced in isolated bisyllables. Specifically, place- and manner-of-articulation information was transmitted better in isolated nonsense syllables than for multisyllabic phrases. The contribution of visual speech cues to place-of-articulation information was also lower for consonants spoken in sequence at a conversational syllabic rate. These data imply that auditory-visual benefit based on models of feature complementarity from isolated syllable productions may over-estimate real-world benefit of integrating auditory and visual speech cues.

show abstract

Visualization of Speech Perception Analysis via Phoneme Alignment: A Pilot Study

Cited by 2 publications

References 47 publications

Preliminary Evaluation of Automated Speech Recognition Apps for the Hearing Impaired and Deaf

Preliminary Evaluation of Automated Speech Recognition Apps for the Hearing Impaired and Deaf

Consonant Perception in Connected Syllables Spoken at a Conversational Syllabic Rate

Contact Info

Product

Resources

About