2021 6th International Conference on Computer Science and Engineering (UBMK) 2021
DOI: 10.1109/ubmk52708.2021.9558954
|View full text |Cite
|
Sign up to set email alerts
|

Deep Learning for Videoconferencing: A Brief Examination of Speech to Text and Speech Synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(4 citation statements)
references
References 22 publications
0
4
0
Order By: Relevance
“…This study also analyses the experimental outcomes of applying two state-of-the-art pre-trained models to various test conditions and comparing the results. AI may power future video conferencing systems [22]. This study provides an overview of transcription and speech synthesis systems that are based on deep learning.…”
Section: Literature Reviewmentioning
confidence: 99%
See 1 more Smart Citation
“…This study also analyses the experimental outcomes of applying two state-of-the-art pre-trained models to various test conditions and comparing the results. AI may power future video conferencing systems [22]. This study provides an overview of transcription and speech synthesis systems that are based on deep learning.…”
Section: Literature Reviewmentioning
confidence: 99%
“…In addition, the experimental findings of two cutting-edge pre-trained models are also scrutinized. AI may power future video conferencing systems [23].…”
Section: Literature Reviewmentioning
confidence: 99%
“…(16) . Our knowledge of human speech processes is still incomplete, the quality of text-to-speech is far from naturalsounding (17) . Here the researcher generates and analyze the prosodic information from the recorded Sindhi sounds using the back propagation neural network (18) .…”
Section: Related Workmentioning
confidence: 99%
“…Among them, ASR has been popularly deployed for voice-enabled information retrieval using artificial intelligence (AI) speakers and chatbots [ 5 , 6 , 7 , 8 ]. It has also been used for the transcription of social media videos [ 9 ] and video conferencing [ 10 , 11 ]. Traditionally, an ASR system is composed of three modules: a feature extractor for representation of the speech signal, an acoustic model for mapping acoustic features to linguistic units, and a language model regarding the grammar, lexicon, etc.…”
Section: Introductionmentioning
confidence: 99%