Hearing impairment is one of the congenital deafness frequently found in children, which is followed by a delayed speech. Furthermore, a speech therapist currently available is limited. In this research, we outlined the development of the Indonesian audio-visual speech synthesis system for learning of the deaf children with delayed speech. First, we developed two kinds of Indonesian corpus, such as speech corpus and audio-visual corpus. The speech corpus contains speech recordings from professional speech therapists. The total duration of all recorded Indonesian speech database is more than 18 hours of audio. The audio-visual corpus contains visual phoneme (viseme) which is the visualization of Indonesian phoneme for lips. Segmentation and labeling were conducted to create transcriptions. We did some variation in the number of sentences and the type of sentences used in the training part of speech synthesis. Audio-visual synthesis used viseme concatenation method. The objective evaluation result using the Mel-cepstrum distortion method was 2.8. The subjective evaluation result using Mean Opinion Score was 3.71. The evaluation results showed that the new design of Indonesian audio-visual speech synthesis for learning to produce any single meaningful word was capable to use as the alternative for hospitals for the therapy of the delayed speech patients.
The sparse method or better known as compressed sensing (CS), is a method often used for the signal reconstruction process. This method had considered better than conventional methods because it can reconstruct a signal with a smaller amount of data. Many algorithms had used for signal reconstruction using the CS method, including l1-minimization and orthogonal matching pursuit (OMP). In this study, the two algorithms were used for signal reconstruction of underwater objects and then compared to find out which algorithm is better for the signal reconstruction of underwater objects. Comparing the two algorithms had based on parameters in the form of PSNR and RMSE against sparsity. Based on the simulations that had been doing, known that the l1-minimization algorithm can reconstruct signal up to 40% sparsity. Whereas the OMP algorithm can only reconstruct signals up to 30% sparsity. PSNR and RMSE generated from the l1-minimization algorithm show that this algorithm provides better reconstruction results than OMP for underwater object signals. The results obtained show that the best tracking process is at an angle of incidence of 90°.
In this paper, we compare the naturalness quality of Bahasa Indonesia speech synthesis using festvox automatic- and hand-segmentation and labeling technique to create a speech transcription. First, we developed a 1549 declarative and question sentence phonetically balanced speech corpus uttered by six male and female speakers. We selected 47, 72, 119, 450, 929, and 1379 sentences, respectively for training whilst maintaining the phonetical balance. The objective is to find the least data training for synthesized naturalness evaluation on both automatic- and hand-segmentation and labeling. The evaluation result using the Mel-cepstrum distortion method was 2.9 for hand-segmentation and labeling, 5.36 for automatic with 47 training sentences, respectively which took about 45 minutes to complete. The performance was increased by 2.46 with hand-segmentation and labeling, 4.78 for automatic, with 1379 sentences and about 9 hours of training time. The Mean Opinion Score was 3.98 (hand) and 3.04 for automatic, respectively which is about 18% performance improvement. The automatic segmentation and labeling introduced phoneme boundary errors which may suggest that the necessity to take careful consideration in segmentation and labeling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.