1st Workshop on Intelligent Infocommunication Networks, Systems and Services 2023
DOI: 10.3311/wins2023-014
|View full text |Cite
|
Sign up to set email alerts
|

Implementing a Text-to-Speech synthesis model on a Raspberry Pi for Industrial Applications

Abstract: Text-to-Speech (TTS) technology produces human-like speech from input text. It has recently acquired prominence by applying deep neural networks. Nowadays, endto-end TTS models produce highly natural synthesized speech but require extremely high computational resources. Deploying such high-quality TTS models in a real-time environment has been a challenging problem due to the limited resources of embedding systems and cell phones. This paper demonstrated the implementation of an end-to-end TTS model (FastSpeec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 11 publications
0
2
0
Order By: Relevance
“…With p > 0.05, the two systems were significantly different. Moreover, it outperforms our earlier study results on RPi with the TTS model of FastSpeech 2 and HiFi-GAN V1, which obtained RTF 8.93 [21]. After that, we tested the cache mechanism in the system with the same ten sentences.…”
Section: Real-time System Evaluation and Runtime Analysismentioning
confidence: 75%
See 1 more Smart Citation
“…With p > 0.05, the two systems were significantly different. Moreover, it outperforms our earlier study results on RPi with the TTS model of FastSpeech 2 and HiFi-GAN V1, which obtained RTF 8.93 [21]. After that, we tested the cache mechanism in the system with the same ten sentences.…”
Section: Real-time System Evaluation and Runtime Analysismentioning
confidence: 75%
“…Following our previous study [21], we built an intelligent alarm model using a modified TTS of an acoustic model (FastSpeech 2) with a neural vocoder (HiFi-GAN V3). We compared our TTS model inference speech speed with a baseline TTS model (FastSpeech 2 and MelGAN vocoder).…”
Section: Introductionmentioning
confidence: 99%