Spoken language resources for Cantonese speech processing

Lee, Tan; Lo, Wai Kit; Ching, P.C.; Meng, Helen

doi:10.1016/s0167-6393(00)00101-1

Cited by 82 publications

(48 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The population F0 range was estimated for male and female Cantonese talkers separately from a large-scale speech corpus, which contains read speech materials from 68 native Cantonese speakers, with half of the speakers in each gender (Lee et al, 2002). The upper and lower F0 range was measured from the average F0 of words carrying the highest tone and lowest tone produced by all female and male speakers respectively.…”

Section: Methodsmentioning

confidence: 99%

Toward an integrative model of talker normalization.

Zhang¹,

Chen²

2016

Journal of Experimental Psychology: Human Perception and Perfor

View full text Add to dashboard Cite

Successful speech perception requires accurate mapping of speech signals to linguistic categories despite talker variation in signals. Although factors like intrinsic and context cues have been identified, a full understanding of talker normalization remains to be achieved. In particular, it is important to examine the cocontribution of intrinsic, extrinsic and other cues in an integrative way. In Experiment 1, we examined the effect of intrinsic cues and typicality of a talker’s F0 range relative to population F0 range on word identification in isolation. In Experiment 2, we compared the effects of 4 contexts to identify those that consistently facilitate talker normalization. We found that without contexts, word identification accuracy was low and variable depending on talker typicality. Contexts improved performance across all talkers regardless of typicality. But only meaningless and meaningful speech contexts with cues to a talker’s acoustic-phonological space showed consistent effects. We proposed a new model, integrating talker typicality, talker familiarity, and context. Whereas speech signals from familiar or typical talkers may be accurately identified standing alone, a context with cues to a talker’s acoustic-phonological space is necessary in the case of unfamiliar and atypical talkers. It is thus the first model that integrates memory and context effects.

show abstract

Section: Methodsmentioning

confidence: 99%

Toward an integrative model of talker normalization.

Zhang¹,

Chen²

2016

Journal of Experimental Psychology: Human Perception and Perfor

View full text Add to dashboard Cite

show abstract

“…The high insertion rate, very possibly, is due to the fact that all Cantonese digits are monosyllabic; the short duration and simple phonetic content also make them prone to insertions, especially in noise. Similar observations have been reported before: "One of the major sources of errors was due to frequent insertions of digit '5', pronounced as a mono-syllabic nasal[ng5], which may be confused with and treated as part of the nasal coda in the digits '0'[ling4] or '3'[saam1]" [7]. For different noises, error patterns show more varieties: in white noise, digits and silence tend to be misrecognized as ' The sensitivities of the error patterns to SNR's can be further illustrated in Fig.…”

Section: Comparison and Discussionsupporting

confidence: 82%

“…In this study, CUDigit [7], a continuous Cantonese digit database collected at the Chinese University of Hong Kong is used. It consists of 25 male and 25 female speakers.…”

Section: Clean Databasementioning

confidence: 99%

On noise robustness of dynamic and static features for continuous Cantonese digit recognition

Chen

Soong

Lee

SympoTIC '04. Joint 1st Workshop on Mobile Future &Amp; Symposium on Trends in Communications (IEEE Cat. No.04EX877)

View full text Add to dashboard Cite

It has been shown previously that augmented spectral features (static and dynamic cepstra) are effective for improving ASR performance in a clean environment. In this paper we investigate the noise robustness of static and dynamic cepstral features, in a speaker independent, continuous recognition task by using a noise-added, Cantonese digit database (CUDigit). We found that the dynamic cepstrum is more robust to additive, background noise than its static counterpart. The results are consistent across different types of noise and under various SNRs. Exponential weights which can exploit the unequal robustness of two features are optimally trained in a development set. A relative word error rate reduction of 41.9%, mainly on a significant reduction of insertions, is obtained on the test data under various noise and SNR conditions.

show abstract

“…Cantonese and English read speech data were sourced from existing data from the CUSENT [18] and WSJ0 [19] corpora to train background models. For each language, mixedcondition training was also carried out by mixing the background data with the ShefCE training data, to provide mixed-condition models.…”

Section: Speech Recognition Systemsmentioning

confidence: 99%

Shefce: A Cantonese-English bilingual speech corpus for pronunciation assessment

Kwan

Lee

et al. 2017

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

This paper introduces the development of ShefCE: a CantoneseEnglish bilingual speech corpus from L2 English speakers in Hong Kong. Bilingual parallel recording materials were chosen from TED online lectures. Script selection were carried out according to bilingual consistency (evaluated using a machine translation system) and the distribution balance of phonemes. 31 undergraduate to postgraduate students in Hong Kong aged 20-30 were recruited and recorded a 25-hour speech corpus (12 hours in Cantonese and 13 hours in English). Baseline phoneme/syllable recognition systems were trained on background data with and without the ShefCE training data. The final syllable error rate (SER) for Cantonese is 17.3% and final phoneme error rate (PER) for English is 34.5%. The automatic speech recognition performance on English showed a significant mismatch when applying L1 models on L2 data, suggesting the need for explicit accent adaptation. ShefCE and the corresponding baseline models will be made openly available for academic research.

show abstract

Spoken language resources for Cantonese speech processing

Cited by 82 publications

References 22 publications

Toward an integrative model of talker normalization.

Toward an integrative model of talker normalization.

On noise robustness of dynamic and static features for continuous Cantonese digit recognition

Shefce: A Cantonese-English bilingual speech corpus for pronunciation assessment

Contact Info

Product

Resources

About