The Relative Weight of Temporal Envelope Cues in Different Frequency Regions for Mandarin Disyllabic Word Recognition

Zheng, Zhong; Li, Keyi; Guo, Yang; Xin-rong, Wang; Xiao, Lv; Liu, Chengqi; He, Shan; Feng, Gang; Feng, Yuanming

doi:10.3389/fnins.2021.670192

Cited by 3 publications

(1 citation statement)

References 50 publications

(57 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Lexical tone duration was normalized for lexical tone tokens to minimize bias on tonal perception ( Jing et al, 2017 ). The speech materials were filtered into 30 contiguous frequency bands using zero-phase, third-order Butterworth filters (18 dB/oct slopes), ranging from 80 to 7,562 Hz ( Li et al, 2016 ; Guo et al, 2017 ; Zheng et al, 2021 ). Each frequency band was an equivalent rectangular bandwidth for normal people, which simulates the frequency selection of normal auditory system ( Glasberg and Moore, 1990 ).…”

Section: Methodsmentioning

confidence: 99%

Relative Weights of Temporal Envelope Cues in Different Frequency Regions for Mandarin Vowel, Consonant, and Lexical Tone Recognition

Zheng

Feng³

et al. 2021

Front. Neurosci.

Self Cite

View full text Add to dashboard Cite

Objectives: Mandarin-speaking users of cochlear implants (CI) perform poorer than their English counterpart. This may be because present CI speech coding schemes are largely based on English. This study aims to evaluate the relative contributions of temporal envelope (E) cues to Mandarin phoneme (including vowel, and consonant) and lexical tone recognition to provide information for speech coding schemes specific to Mandarin.Design: Eleven normal hearing subjects were studied using acoustic temporal E cues that were extracted from 30 continuous frequency bands between 80 and 7,562 Hz using the Hilbert transform and divided into five frequency regions. Percent-correct recognition scores were obtained with acoustic E cues presented in three, four, and five frequency regions and their relative weights calculated using the least-square approach.Results: For stimuli with three, four, and five frequency regions, percent-correct scores for vowel recognition using E cues were 50.43–84.82%, 76.27–95.24%, and 96.58%, respectively; for consonant recognition 35.49–63.77%, 67.75–78.87%, and 87.87%; for lexical tone recognition 60.80–97.15%, 73.16–96.87%, and 96.73%. For frequency region 1 to frequency region 5, the mean weights in vowel recognition were 0.17, 0.31, 0.22, 0.18, and 0.12, respectively; in consonant recognition 0.10, 0.16, 0.18, 0.23, and 0.33; in lexical tone recognition 0.38, 0.18, 0.14, 0.16, and 0.14.Conclusion: Regions that contributed most for vowel recognition was Region 2 (502–1,022 Hz) that contains first formant (F1) information; Region 5 (3,856–7,562 Hz) contributed most to consonant recognition; Region 1 (80–502 Hz) that contains fundamental frequency (F0) information contributed most to lexical tone recognition.

show abstract

Section: Methodsmentioning

confidence: 99%

Relative Weights of Temporal Envelope Cues in Different Frequency Regions for Mandarin Vowel, Consonant, and Lexical Tone Recognition

Zheng

Feng³

et al. 2021

Front. Neurosci.

Self Cite

View full text Add to dashboard Cite

show abstract

No Musician Advantage in the Perception of Degraded–Fundamental Frequency Speech in Noisy Environments

Hsieh

Guo

2023

J Speech Lang Hear Res

View full text Add to dashboard Cite

Purpose: Pitch variations of the fundamental frequency ( f o ) contour contribute to speech perception in noisy environments, but whether musicians confer an advantage in speech in noise (SIN) with altered f o information remains unclear. This study investigated the effects of different levels of degraded f o contour (i.e., conveying lexical tone or intonation information) on musician advantage in speech-in-noise perception. Method: A cohort of native Mandarin Chinese speakers, comprising 30 trained musicians and 30 nonmusicians, were tested on the intelligibility of Mandarin Chinese sentences with natural, flattened-tone, flattened-intonation, and flattened-all f o contours embedded in background noise masked under three signal-to-noise ratios (0, −5, and −9 dB). Pitch difference thresholds and innate musical skills associated with speech-in-noise benefits were also assessed. Results: Speech intelligibility score improved with increasing signal-to-noise level for both musicians and nonmusicians. However, no musician advantage was observed for identifying any type of flattened- f o contour SIN. Musicians exhibited smaller f o pitch discrimination limens than nonmusicians, which correlated with benefits for perceiving speech with intact tone-level f o information. Regardless of musician status, performance on the pitch and accent musical-skill subtests correlated with speech intelligibility score. Conclusions: Collectively, these results provide no evidence for a musician advantage for perceiving speech with distorted f o information in noisy environments. Results further show that perceptual musical skills on pitch and accent processing may benefit the perception of SIN, independent of formal musical training. Our findings suggest that the potential application of music training in speech perception in noisy backgrounds is not contingent on the ability to process f o pitch contours, at least for Mandarin Chinese speakers. Supplemental Material: https://doi.org/10.23641/asha.23706354

show abstract

The Contribution of Sub-Band Temporal Fine Structure to the Intelligibility of Uyghur Sentences

Song,

Huang,

et al. 2024

2024 IEEE 5th International Conference on Pattern Recognition and Machine Learning (PRML)

View full text Add to dashboard Cite

The Relative Weight of Temporal Envelope Cues in Different Frequency Regions for Mandarin Disyllabic Word Recognition

Cited by 3 publications

References 50 publications

Relative Weights of Temporal Envelope Cues in Different Frequency Regions for Mandarin Vowel, Consonant, and Lexical Tone Recognition

Relative Weights of Temporal Envelope Cues in Different Frequency Regions for Mandarin Vowel, Consonant, and Lexical Tone Recognition

No Musician Advantage in the Perception of Degraded–Fundamental Frequency Speech in Noisy Environments

The Contribution of Sub-Band Temporal Fine Structure to the Intelligibility of Uyghur Sentences

Contact Info

Product

Resources

About