Assessing the quality of voice synthesizers

Teodorescu, Horia-Nicolai; Feraru, Monica; Zbancioc, Marius Dan

doi:10.1109/sped.2009.5156174

Cited by 3 publications

(3 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• Three evaluators validated how it was made the determination of the boundary between phonemes and different type of the pauses. In our previous papers [10][11][12], the extracted parameters were made on the phoneme level. Now the extracted parameters used for the emotion classification are on the word/sentence level.…”

Section: Ease the Description Of The Srol Databasementioning

confidence: 99%

“…In our previous studies [10][11][12], we applied four methods of detection of the fundamental frequency -F0 using the autocorrelation function, AMDF method (average magnitude difference function), the cepstral analysis and HPS method (harmonic product spectrum). The hierarchical hybrid system for the F0 detection gave satisfactory results for neutral tone, but was found that the segmentation error, especially for the emotional files, was bigger enough to justify the analysis of the other segmentation methods.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

The automatic segmentation of the vocal signal using predictive neural network

Zbancioc

Feraru²

2013

International Symposium on Signals, Circuits and Systems ISSCS2013

View full text Add to dashboard Cite

The automatic segmentation of the vocal signal precedes the features extraction stages, respectively the emotion recognition/classification. The extraction of the prosodic parameters as fundamental frequency (F0) and formants (F1-F4), cepstral coefficients LPCC and MFCC are made only on the vowel areas. The analysis tools from the SROL corpus are using a hybrid hierarchical system with four segmentation methods based on the autocorrelation function, AMDF method, the cepstral analysis and HPS method. Since the performance of this instrument has not been yet satisfactory, we analyzed other segmentation possibilities in order to obtain the best possible accuracy in segmentation. The predictive neural network used in this paper is in fact a simple perceptron which can approximate with high accuracy the quasi-periodic signals such as the vowels. The consonants have noisy properties and are complicated transition processes. The prediction error for the consonants comparing with the vowels is higher when it is used a sample neural network architecture.

show abstract

Section: Ease the Description Of The Srol Databasementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

The automatic segmentation of the vocal signal using predictive neural network

Zbancioc

Feraru²

2013

International Symposium on Signals, Circuits and Systems ISSCS2013

View full text Add to dashboard Cite

show abstract

“…Each speaker pronounced a sentence with a neutral tone and with three simulated emotions states: joy, sadness and fury. More information can be found in [14][15][16][17][18][19].…”

Section: The Emotional Databasementioning

confidence: 99%