Tone Classification in Mandarin Chinese Using Convolutional Neural Networks

Chen, Charles; Bunescu, Răzvan; Xu, Li; Liu, Chang

doi:10.21437/interspeech.2016-528

Cited by 24 publications

(30 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[7,8] applies DNN to tone recognition on female corpus and some good results are achieved. More recently, [9] employs Convolutional Neural Network (CNN) for speech evaluation of the hearing-impaired population. However, feedforward neural networks like DNN and CNN are not designed to model time-series so that it is difficult to handle the F0 variations especially in continuous speech.…”

Section: Introductionmentioning

confidence: 99%

Improving Mandarin Tone Recognition Using Convolutional Bidirectional Long Short-Term Memory with Attention

2018

View full text Add to dashboard Cite

Automatic tone recognition is useful for Mandarin spoken language processing. However, the complex F0 variations from the tone co-articulations and the interplay effects among tonality make it rather difficult to perform tone recognition of Chinese continuous speech. This paper explored the application of Bidirectional Long Short-Term Memory (BLSTM), which had the capability of modeling time series, to Mandarin tone recognition to handle the tone variations in continuous speech. In addition, we introduced attention mechanism to guide the model to select the suitable context information. The experimental results showed that the performance of proposed CNN-BLSTM with attention mechanism was the best and it achieved the tone error rate (TER) of 9.30% with a 17.6% relative error reduction from the DNN baseline system with TER of 11.28%. It demonstrated that our proposed model was more effective to handle the complex F0 variations than other models.

show abstract

Section: Introductionmentioning

confidence: 99%

Improving Mandarin Tone Recognition Using Convolutional Bidirectional Long Short-Term Memory with Attention

2018

View full text Add to dashboard Cite

show abstract

“…The highest accuracy (95.53%) was obtained using the convolutional neural network (CNN) and the mel-frequency cepstral coefficients (MFCCs) for tone pronunciation of 4500 syllables by 125 children aged 3 to 10. This study [34] showed that tone recognition by machine learning is possible; however, there are some shortcomings in learning tones with the tone recognition system. This study used children's speech, which had limitations on the pitch range, and the spectrogram and MFCCs of pronounced monosyllabic words in the paper showed that the third tone was a dipping-rising tone (Figure 4).…”

Section: Learning Mandarin Tone With Machine Learningmentioning

confidence: 98%

“…Machine learning data from previous studies on tone recognition[34]. Top: Time waveforms; Middle: Spectrograms; Bottom: Mel-frequency cepstral coefficients (MFCCs).…”

mentioning

confidence: 99%

Language Cognition and Pronunciation Training Using Applications

Kan

Ito

2020

Future Internet

View full text Add to dashboard Cite

In language learning, adults seem to be superior in their ability to memorize knowledge of new languages and have better learning strategies, experiences, and intelligence to be able to integrate new knowledge. However, unless one learns pronunciation in childhood, it is almost impossible to reach a native-level accent. In this research, we take the difficulties of learning tonal pronunciation in Mandarin as an example and analyze the difficulties of tone learning and the deficiencies of general learning methods using the cognitive load theory. With the tasks designed commensurate with the learner’s perception ability based on perception experiments and small-step learning, the perception training app is more effective for improving the tone pronunciation ability compared to existing apps with voice analysis function. Furthermore, the learning effect was greatly improved by optimizing the app interface and operation procedures. However, as a result of the combination of pronunciation practice and perception training, pronunciation practice with insufficient feedback could lead to pronunciation errors. Therefore, we also studied pronunciation practice using machine learning and aimed to train the model for the pronunciation task design instead of classification. We used voices designed as training data and trained a model for pronunciation training, and demonstrated that supporting pronunciation practice with machine learning is practicable.

show abstract

“…Remarkably, they report that the MFCC-based recognizer handily outperforms the HDPF-based recognizer. Similarly, in [13], Chen et al train a convolutional network to take as input a window of MFCCs for a single tonal syllable and predict its tone. Although F0 can be estimated very accurately, these results show that F0-based features are not the best features for tone recognition, or at least that there is some information in the input signal that is lost when HDPFs alone are used.…”

Section: Existing Approachesmentioning

confidence: 99%

Tone Recognition Using Lifters and CTC

Lugosch

Tomar

2018

Interspeech 2018

View full text Add to dashboard Cite

In this paper, we present a new method for recognizing tones in continuous speech for tonal languages. The method works by converting the speech signal to a cepstrogram, extracting a sequence of cepstral features using a convolutional neural network, and predicting the underlying sequence of tones using a connectionist temporal classification (CTC) network. The performance of the proposed method is evaluated on a freely available Mandarin Chinese speech corpus, AISHELL-1, and is shown to outperform the existing techniques in the literature in terms of tone error rate (TER).

show abstract

Tone Classification in Mandarin Chinese Using Convolutional Neural Networks

Cited by 24 publications

References 23 publications

Improving Mandarin Tone Recognition Using Convolutional Bidirectional Long Short-Term Memory with Attention

Improving Mandarin Tone Recognition Using Convolutional Bidirectional Long Short-Term Memory with Attention

Language Cognition and Pronunciation Training Using Applications

Tone Recognition Using Lifters and CTC

Contact Info

Product

Resources

About