Text to speech (TTS) conversion is a system that can convert the written text into their corresponding speech. It is a very useful application for the visual and speech impaired person. The optimal character recognition (OCR) -based TTS system to help such visually challenged people by OCR has been proposed [1]. The resulting text from the OCR is converted into speech. They used the blind deconvolution method and pre-processing operation to remove the effect of noise and blur so that they can achieve the efficient result of the framework for visually challenged. Nowadays, high quality TTS software can be commercially available for different languages. The most used speech synthesis approaches are articulatory synthesis, formant synthesis, concatenative synthesis and hidden Markov model (HMM)-based model approach. Each approach has their reasonable advantages and disadvantages based on the usage of languages. *Author for correspondence Among them, the concatenative synthesis approach is used in our system because it can generate natural sound as a consequence of pre-recorded sound. The speech quality and the size of the system is a tradeoff based on the different speech units for concatenation. The current speech units are word, syllable, phoneme, di-phone, tri-phone and so on. Many TTS systems proposed by [2−6] have been implemented by using concatenative method based on different speech units and they can generate high quality synthesized speech. A numerical TTS synthesis system for three languages: Marathi, Hindi and English languages is proposed by [7]. They used the approach that combined rule-based approach and concatenation-based approach. They used all utterances of sound units have been used for concatenation and generation of speech signal. They compare two Arabic text to speech systems: two screen readers, namely, non-visual desktop access (NVDA) and integrated bilingual solution for the blind or visually impaired, in the Arab (IBSAR) [8]. They tested the quality of two systems in terms of