Automatic generation of synthesis units based on context oriented clustering

Nakajima, Shigeru; Hamada, Hiroshi

doi:10.1109/icassp.1988.196672

Cited by 48 publications

(19 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…During synthesis, pitch and duration modification are used to obtain a desired prosody. Unit selection synthesis is the most popular variant of concatenative synthesis, and was first proposed by Nakajama and Hamada in 1988 [26]. Since then various systems including commercial systems were developed resulting in a higher level of reading-style synthetic speech [27,28,29] and it is today considered as the state of the art in text-to speech synthesis.…”

Section: Fig 4 Architecture Of a Concatenative Text-to-speech Systemmentioning

confidence: 99%

An Assistive Reading System for Visually Impaired using OCR and TTS

Sharma¹,

Srivastava²,

Vashishth³

2014

IJCA

View full text Add to dashboard Cite

Reading machines are mechatronic devices which use optical character recognition and text-to-speech technology in order to output synthetic voice from printed text. In this paper an assistive system has been proposed for visually impaired or blind persons. It reads textual information on papers and produces corresponding voice using OCR (Optical Character Recognition)and TTS (Text-to-speech) system. To localize text regions in images connected component labeling approach using histogram analysis is done on binarized image. TTS system using Concatenative synthesis based on SDK (Software Development Kit) platform is used. This system is operated via a voice-based user interface and also has a user friendly GUI (graphical user interface) to scan the text and to control various speech parameters. Speech signal produced can be saved and reproduced for later use.

show abstract

Section: Fig 4 Architecture Of a Concatenative Text-to-speech Systemmentioning

confidence: 99%

An Assistive Reading System for Visually Impaired using OCR and TTS

Sharma¹,

Srivastava²,

Vashishth³

2014

IJCA

View full text Add to dashboard Cite

show abstract

“…pitch and duration modification are used to obtain a desired prosody. Unit selection synthesis is the most popular variant of concatenative synthesis and was first proposed by Nakajama and Hamada in 1988 [15]. Since then various systems including commercial systems were developed resulting in a higher level of reading-style synthetic speech [16,17,18] and it is today considered as the state of art in text-to speech synthesis.…”

Section: Fig 1: Block Diagram Of General Text To Speech Systemmentioning

confidence: 99%

An Intelligent Text to Speech System for Windows based Systems and Mobile Devices

Srivastava¹,

Sharma²,

Jain³

2014

IJCA

View full text Add to dashboard Cite

TTS (Text-to-speech) systems are used invariably as part of our daily lives and have come a long way. In this paper TTS system using Concatenative synthesis based on the SDK (Software Development Kit) platform has been presented. This system is compatible with both computer and mobile devices. It has a user friendly GUI (graphical user interface) to control various speech parameters. Speech signal produced can be saved and listened to whenever required. Signal analysis of the output speech can also be done using TTS System. The results of these signal analysis along with the stored speech signal can be used for further applications depending upon the requirements. It is an intelligent system and is able to overcome various normalization problems.

show abstract

“…Special methods to generate a unit inventory have been proposed by the research group at NTT in Japan (10,11). The synthesis allophones are selected with the help of the contextoriented clustering (COC) method.…”

Section: Concatenation Of Unitsmentioning

confidence: 99%

Models of speech synthesis.

Carlson¹

1995

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

The term "speech synthesis" has been used for diverse technical approaches. In this paper, some of the approaches used to generate synthetic speech in a text-tospeech system are reviewed, and some of the basic motivations for choosing one method over another are discussed. It is important to keep in mind, however, that speech synthesis models are needed not just for speech generation but to help us understand how speech is created, or even how articulation can explain language structure. General issues such as the synthesis of different voices, accents, and multiple languages are discussed as special challenges facing the speech synthesis community.The term "speech synthesis" has been used for diverse technical approaches. Unfortunately, any speech output from computers has been claimed to be speech synthesis, perhaps with the exception of playback of recorded speech.* Some of the approaches used to generate true synthetic speech as well as high-quality waveform concatenation methods are presented below.Knowledge About Natural Speech Synthesis development can be grouped into three main categories: acoustic models, articulatory models, and models based on the coding of natural speech. The last group includes both predictive coding and concatenative synthesis using speech waveforms. Acoustic and articulatory models have had a long history of development, while natural speech models represent a somewhat newer field. The first commercial systems were based on the acoustic terminal analog synthesizer. However, at that time, the voice quality was not good enough for general use, and approaches based on coding attracted increased interest. Articulatory models have been under continuous development, but so far this field has not been exposed to commercial applications due to incomplete models and high processing costs.We can position the different synthesis methods along a "knowledge about speech" scale. Obviously, articulatory synthesis needs considerable understanding of the speech act itself, while models based on coding use such knowledge only to a limited extent. All synthesis methods have to model something that is partly unknown. Unfortunately, artificial obstacles due to simplifications or lack of coverage will also be introduced. A trend in current speech technology, both in speech understanding and speech production, is to avoid explicit formulation of knowledge and to use automatic methods to aid the development of the system. Since such analysis methods lack the human ability to generalize, the generalization has to be present in the data itself. Thus, these methods need large amounts of speech data. Models working close to the waveform are now typically making use of increased unit sizes while still modeling prosody by rule. In the middle of the scale, "formant synthesis" is moving toward the articulatory models by looking for "higher-level parameters" or to larger prestored units. Articulatory synthesis, hampered by lack of data, still has some way to go but is yielding improved quality, due mostly...

show abstract

Automatic generation of synthesis units based on context oriented clustering

Cited by 48 publications

References 6 publications

An Assistive Reading System for Visually Impaired using OCR and TTS

An Assistive Reading System for Visually Impaired using OCR and TTS

An Intelligent Text to Speech System for Windows based Systems and Mobile Devices

Models of speech synthesis.

Contact Info

Product

Resources

About