Interspeech 2007 2007
DOI: 10.21437/interspeech.2007-547
|View full text |Cite
|
Sign up to set email alerts
|

Automatic phonetic segmentation of Spanish emotional speech

Abstract: To achieve high quality synthetic emotional speech, unitselection is the state-of-the-art technique. Nevertheless, a large expensive phonetically-segmented corpus is needed, and cost-effective automatic techniques should be studied. According to the HMM experiments in this paper: segmentation performance can depend heavily on the segmental or prosodic nature of the intended emotion (segmental emotions are more difficult to segment than prosodic ones), several emotions should be combined to obtain a larger trai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2010
2010
2021
2021

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 8 publications
0
1
0
Order By: Relevance
“…Firstly, two single-feature SVM-based systems were developed considering two different types of compact parameterizations: the average of the MFCC and the average of the modulation spectrogram. The first parameterization, MFCC, is a very popular feature extraction procedure in audio and speech related tasks (see, for example, [38,31]), and for this reason, it was tried for the task under consideration in our previous work [10]. MFCCs are extracted on a frame-by-frame basis by applying the Discrete Cosine Transform on the log-mel spectrogram of the speech signal (see Subsection 4.2) and retaining the first 13 coefficients.…”
Section: Reference Systemsmentioning
confidence: 99%
“…Firstly, two single-feature SVM-based systems were developed considering two different types of compact parameterizations: the average of the MFCC and the average of the modulation spectrogram. The first parameterization, MFCC, is a very popular feature extraction procedure in audio and speech related tasks (see, for example, [38,31]), and for this reason, it was tried for the task under consideration in our previous work [10]. MFCCs are extracted on a frame-by-frame basis by applying the Discrete Cosine Transform on the log-mel spectrogram of the speech signal (see Subsection 4.2) and retaining the first 13 coefficients.…”
Section: Reference Systemsmentioning
confidence: 99%