1999
DOI: 10.1016/s0167-6393(98)00085-5
|View full text |Cite
|
Sign up to set email alerts
|

Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
1,067
0
7

Year Published

2004
2004
2017
2017

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 1,654 publications
(1,074 citation statements)
references
References 23 publications
0
1,067
0
7
Order By: Relevance
“…The filter is often realised as minimum phase, derived by cepstral analysis of a smooth spectral envelope [2]. The source could be derived from the residual [3,4] or glottal signal [5], but is commonly just a pulse train / noise, alternating [6] or mixed [7].…”
Section: Introductionmentioning
confidence: 99%
“…The filter is often realised as minimum phase, derived by cepstral analysis of a smooth spectral envelope [2]. The source could be derived from the residual [3,4] or glottal signal [5], but is commonly just a pulse train / noise, alternating [6] or mixed [7].…”
Section: Introductionmentioning
confidence: 99%
“…The 0th-through-59th mel-cepstral coefficients were used as the spectral parameter and F 0 and 5 band-aperiodicity [15], [16] were used as excitation parameters. The STRAIGHT analysis-synthesis system [17] was used for the parameter extraction and waveform synthesis. The 0th mel-cepstral coefficients of the input speech were directly used as those of the converted speech.…”
Section: Experimental Conditionsmentioning
confidence: 99%
“…It is thus expected that all processors could be embedded into the electrolarynx and total latency will be decreased to the 50 msec caused by the real-time statistical F 0 prediction. ing the excitation signals based on the predicted F 0 values are artificially generated using the STRAIGHT [19] analysis/synthesis method.…”
Section: A Simulation Experimentsmentioning
confidence: 99%
“…As the source features, the spectral segment features were extracted from the mel-cepstra at the current ± 4 frames. On the other hand, F 0 values of normal speech were extracted with STRAIGHT F 0 analysis [19] and CF 0 patterns were generated as the target feature using a low-pass filter with 10 Hz cut-off frequency. Moreover, the target F 0 patterns were shifted so that their mean value was equal to 100 Hz to predict F 0 patterns suitable for the source male speaker.…”
Section: Experimental Conditionsmentioning
confidence: 99%