Nobumasa Seiyama scite author profile

This paper describes a method to control prosodic features using phonetic and prosodic symbols as input of attention-based sequenceto-sequence (seq2seq) acoustic modeling (AM) for neural text-to-speech (TTS). The method involves inserting a sequence of prosodic symbols between phonetic symbols that are then used to reproduce prosodic acoustic features, i.e. accents, pauses, accent breaks, and sentence endings, in several seq2seq AM methods. The proposed phonetic and prosodic labels have simple descriptions and a low production cost. By contrast, the labels of conventional statistical parametric speech synthesis methods are complicated, and the cost of time alignments such as aligning the boundaries of phonemes is high. The proposed method does not need the boundary positions of phonemes. We propose an automatic conversion method for conventional labels and show how to automatically reproduce pitch accents and phonemes. The results of objective and subjective evaluations show the effectiveness of our method.

show abstract

A new approach to compensate degeneration of speech intelligibility for elderly listeners-development of a portable real time speech rate conversion system

Nakamura¹,

Seiyama²,

Imai³

et al. 1996

IEEE Trans. on Broadcast.

View full text Add to dashboard Cite

A new technology to compensate degeneration of hearing intelligibility for elderly individuals: Development of a portable real-time speech rate conversion system

Miyasaka¹,

Nakamura²,

Seiyama³

et al. 1996

View full text Add to dashboard Cite

This paper presents a portable real-time speech rate conversion system for compensating degeneration of hearing intelligibility of elderly listeners who are suffering from listening to rapid speech. This system enables an elderly user to convert a speech rate as desired by him/herself on real time, with invariance in pitch as well as small impairments in quality. Conventional hearing aids are focusing on compensation of decreased hearing abilities of peripheral auditory pathways in the frequency area, while the new system tries to compensate for them with central auditory pathways in the temporal area. When this system is applied to speech accompanied with a picture, temporal discrepancy between the converted voice and the picture can be absorbed by changing the speech rate at every pitch period along a monotone decreasing function from slow to fast. The system is 180×130×65 mm in size. Another feature allows a user to change the speech rate quickly whenever he/she wants to convert at any stage of conversion. This system can be applied to both the Japanese spoken language and foreign spoken languages, and can be used as a listening aid to learn foreign languages.

show abstract

Real time speech rate converting system for elderly people

Nakamura

Seiyama

Ikezawa

et al.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.