2021
DOI: 10.1186/s13636-021-00217-4
|View full text |Cite
|
Sign up to set email alerts
|

Performance vs. hardware requirements in state-of-the-art automatic speech recognition

Abstract: The last decade brought significant advances in automatic speech recognition (ASR) thanks to the evolution of deep learning methods. ASR systems evolved from pipeline-based systems, that modeled hand-crafted speech features with probabilistic frameworks and generated phone posteriors, to end-to-end (E2E) systems, that translate the raw waveform directly into words using one deep neural network (DNN). The transcription accuracy greatly increased, leading to ASR technology being integrated into many commercial a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(10 citation statements)
references
References 57 publications
0
10
0
Order By: Relevance
“…The number of audio datasets available in Romanian is rather low [25,26], confirming that Romanian is indeed a low-resource language. We underline that existing datasets comprising Romanian speech samples are mainly focused on automatic speech recognition, ignoring the diversity of dialects within the region.…”
Section: Introductionmentioning
confidence: 94%
“…The number of audio datasets available in Romanian is rather low [25,26], confirming that Romanian is indeed a low-resource language. We underline that existing datasets comprising Romanian speech samples are mainly focused on automatic speech recognition, ignoring the diversity of dialects within the region.…”
Section: Introductionmentioning
confidence: 94%
“…The first experiments date back to the 1970s, but the developments in the sphere of parallel and distributed computing architectures, big data and artificial intelligence in the last years have given a great impetus to improve this technology and, thus, its reliability [10][11]. Compared to the past, the accuracy of transcription has actually improved to such a level that, on condition of a clear and clearly defined acoustic source, the accuracy level may well exceed 99%.…”
Section: Analysis Of Last Achievements and Publicationsmentioning
confidence: 99%
“…Besides the broadcast domain, the significant increase in the ASR field has brought special interests to integrate this technology in many other applications and devices. For instance, considering speech as the most natural means of communication between humans, conversational assistants have acquired great relevance in our daily lives, both in the personal and professional environments [1]. In addition, other main sectors such as Industry, Healthcare or Automotive have already discovered the usability of speech technologies mainly with the use of voice control applications integrated in machines, medical instruments or technical devices.…”
Section: Introductionmentioning
confidence: 99%
“…These interests have triggered special challenges for the current ASR technology, mainly related to the need to optimise and reduce neural models in order to be integrated in devices with low computational power but without a noticeable loss of quality. With the aim of meeting the requirements of embedded systems, the most common optimisation techniques rely on architecture and format optimisation as well as quantisation [1].…”
Section: Introductionmentioning
confidence: 99%