Noboru Kanedera scite author profile

We report on the effect jectories of spectral eral types of filter studied. Results indicati: components of the General conclusions an:: information is in the range between 1 and nent at around 4 Hz, (2) information in modulat tures which include lation spectrum outperform (4) The features which quency bands with app width increase recognition ABSTRACT of band-pass filtering of the time tra-en\-elopes on speech recognition. Sev-(lincar-phase FIR, DCT, and DIT) are the relative importance of different anoddation spectrum of speech for ASR.(1) most of the useful linguistic modulation frequency components from 16 Hz, with the dominant compoit is important to preserve the phase .on frequency domain, (3) The feacomponents at around 4 Hz in moduthe conventional delta features, represent the several modulation fre-:opriate center frequency and band performance.

show abstract

Voice activity detection in noise using modulation spectrum of speech: Investigation of speech frequency and modulation frequency ranges

Pek

Arai

Kanedera

2012

Acoust. Sci. & Tech.

View full text Add to dashboard Cite

Voice activity detection (VAD) in noisy environments is a very important preprocessing scheme in speech communication technology, a field which includes speech recognition, speech coding, speech enhancement and captioning video contents. We have developed a VAD method for noisy environments based on the modulation spectrum. In Experiment 1, we investigate the optimal ranges of speech and modulation frequencies for the proposed algorithm by using the simulated data in the CENSREC-1-C corpus. Results show that when we combine an upper limit frequency between 1,000 and 2,000 Hz with a lower limit frequency of less than 300 Hz as speech frequency bands, error rates are lower than with other bands. Furthermore, when we use the frequency components of the modulation spectrum between 3-9, 3-11, 3-14, 3-18, 4-9, 4-11, 4-14, 4-18, 5-7, 5-9, 5-11, or 5-14 Hz, the proposed method performs VAD well. In Experiment 2, we use one of the best parameter settings from Experiment 1 and evaluate the real environment data in the CENSREC-1-C corpus by comparing our method with other conventional methods. Improvements were observed from the VAD results for each SNR condition and noise type.

show abstract

Speech analysis/synthesis/conversion by using sequential processing

Panuthat

Funada

Kanedera

1999

View full text Add to dashboard Cite

This paper presents a method for speech analysis/synthesis/ conversion by using sequential processing. The aims of this method are to improve the quality of synthesized speech and to convert the original speech into another speech of different characteristics. We apply the Kalman Filter for estimating the auto-regressive coefficients of 'time varying AR model with unknown input (ARUI model)', which we have proposed to improve the conventional AR model, and we use a band-pass filter for making 'a guide signal' to extract the pitch period from the residual signal. These signals are utilized to make the driving source signal in speech synthesis. We also use the guide signal for speech conversion, such as in pitch and utterance length. Moreover, we show experimentally that this method can analyze/synthesize/convert speech without causing instability by using the smoothed auto-regressive coefficients.

show abstract

Subtopic segmentation in lecture speech for the creation of lecture video contents

et al. 2006

View full text Add to dashboard Cite

SUMMARYAlthough still rare, video instructional materials which can be used over a network are on the increase. One of the reasons for the rarity of video instructional materials is thought to be the time and effort necessary for video editing. In this paper, the authors examine a method for automatically estimating subtopic segmentation positions from the speech information of an unedited lecture video, with the purpose of supporting the preparation of video instructional materials. Subtopic segmentation positions were estimated with comparisons of successive indexes using dynamic programming. The indexes were obtained by independent component analysis of text information attained from the speech recognition processing of the video. Through an experiment using unedited lecture video from five instructors, the proposed method was found to have a segmentation capacity equal to or better than the Hearst method, while allowing the number of segments to be set freely. It was also confirmed that the subtopic segmentation capacity using speech recognition output was equivalent to the use of transcribed text.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Noboru Kanedera

On the relative importance of various components of the modulation spectrum for automatic speech recognition

On properties of modulation spectrum for robust automatic speech recognition

Voice activity detection in noise using modulation spectrum of speech: Investigation of speech frequency and modulation frequency ranges

Speech analysis/synthesis/conversion by using sequential processing

Subtopic segmentation in lecture speech for the creation of lecture video contents

Contact Info

Product

Resources

About