1999
DOI: 10.1007/3-540-48239-3_35
|View full text |Cite
|
Sign up to set email alerts
|

Fast and Robust Features for Prosodic Classification?

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
8
0

Year Published

2000
2000
2008
2008

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 4 publications
0
8
0
Order By: Relevance
“…We developed a database of 270 prosodic features describing pause, pitch, duration, and energy information in the vicinity of each word boundary, inspired by [2,17]. Features were extracted directly from the automatically aligned speech signal, so that no hand-labeling of prosody (such as ToBI) was necessary in model training.…”
Section: Prosodic Featuresmentioning
confidence: 99%
“…We developed a database of 270 prosodic features describing pause, pitch, duration, and energy information in the vicinity of each word boundary, inspired by [2,17]. Features were extracted directly from the automatically aligned speech signal, so that no hand-labeling of prosody (such as ToBI) was necessary in model training.…”
Section: Prosodic Featuresmentioning
confidence: 99%
“…We originally developed a database of 270 prosodic features (inspired by [3,17]) that capture pause, pitch, duration, and energy information associated with each word boundary. Features were extracted directly from the automatically aligned speech signal.…”
Section: Prosodic Featuresmentioning
confidence: 99%
“…One practical problem with using segmental durations in the feature set, for developers using third-party (off-the-shelf) speech recognition software, is that commercial systems typically do not provide phonetic time alignments with the word recognition output, though word times may be available. In [7], a substitute set of features is proposed that does not require phone time alignments but instead uses word durations normalized by summed phoneme average duration statistics. While they report good results for accent and boundary detection, the approach obscures cues known to occur at the syllable level.…”
Section: Introductionmentioning
confidence: 99%