2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2014
DOI: 10.1109/icassp.2014.6854049
|View full text |Cite
|
Sign up to set email alerts
|

A pitch extraction algorithm tuned for automatic speech recognition

Abstract: In this paper we present an algorithm that produces pitch and probability-of-voicing estimates for use as features in automatic speech recognition systems. These features give large performance improvements on tonal languages for ASR systems, and even substantial improvements for non-tonal languages. Our method, which we are calling the Kaldi pitch tracker (because we are adding it to the Kaldi ASR toolkit), is a highly modified version of the getf0 (RAPT) algorithm. Unlike the original getf0 we do not make a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
167
0
2

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
1
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 267 publications
(169 citation statements)
references
References 9 publications
0
167
0
2
Order By: Relevance
“…Rillard [25] and d'Allessandro [26] have suggested using the power of the speech signal instead, easing wRMSE calculation. We have opted for the latter, augmenting it with the POV calculated as detailed by Ghahremani [27]. Incorporating the POV in the weighing eliminates the need to hard threshold the POV to obtain voicing, making the whole approach more robust.…”
Section: Intonation Similarity Measuresmentioning
confidence: 99%
See 1 more Smart Citation
“…Rillard [25] and d'Allessandro [26] have suggested using the power of the speech signal instead, easing wRMSE calculation. We have opted for the latter, augmenting it with the POV calculated as detailed by Ghahremani [27]. Incorporating the POV in the weighing eliminates the need to hard threshold the POV to obtain voicing, making the whole approach more robust.…”
Section: Intonation Similarity Measuresmentioning
confidence: 99%
“…The Kaldi pitch tracker was used for F0 and probability of voicing (POV) extraction [27]. We used 50ms frame length with 5ms frameshift for extraction.…”
Section: Tools and Settingsmentioning
confidence: 99%
“…We will use the WCORR norm , in order to assess the perceptual quality of the modelled F 0 using the thresholds discussed in Section 4.2. To extract the continuous F 0 and POV estimates we will use the pitch tracker implemented in Kaldi (Ghahremani et al, 2014) 3 . The second hypothesis is one of comparison of our generalised CR model with a state-of-the-art implementation of the standard CR model.…”
Section: Experiments Designmentioning
confidence: 99%
“…3. We define the weighting function to be (4), where p(i) is the probability of voicing (POV), as defined by Ghahremani et al (2014), and e(i) is the energy contour of the speech signal. This is in accord with newer trends in perceptual intonation studies d'Alessandro et al, 2011).…”
Section: Introductionmentioning
confidence: 99%
“…The same Kaldi recipe was used (see https://github.com/ bootphon/abkhazia/blob/master/abkhazia/kaldi/ kaldi templates/train and decode.sh) with the same parameters and input features to train all models. Input features consisted of 13 MFCC coefficients plus 3 pitchrelated features (Ghahremani et al, 2014) and their delta and delta-deltas coefficients. Pitch features were included because tone is contrastive in Mandarin and Vietnamese (i.e.…”
Section: Asr Modelsmentioning
confidence: 99%