This paper aims to present the state-of-the-art of speech recognition from a systematic review of the literature. For this, 222 papers from four digital repositories were examined. The research followed a methodology composed of questions of search, expression of search and criteria of inclusion and exclusion. After reading the abstract, introduction and conclusion, nine papers were selected. Based on the analysis of the selected papers, we observed that the research prioritizes the following topics: (i) solutions to reduce the error rate; (ii) neural networks for language models; and (iii) n-gram statistical models. However, no solution was offered to provide offline voice recognition on Android mobile devices. The information obtained is very useful in order to acquire knowledge to be used in the development of offline voice recognition in mobile devices. The techniques provide guidelines for the application of the best neural networks and mechanisms for reducing error rates.
RESUMOO uso de interfaces inteligentes, recursos de usabilidade e tecnologias de voz estão possibilitando que as aplicações se tornem cada vez mais ricas, em especial para auxiliar usuários inexperientes ou com necessidades especiais. Com isso muitas empresas desenvolvedoras de softwares estão buscando maneiras de implementar as tecnologias de voz em seus produtos, e uma das formas mais utilizadas é através do uso de Application Programming Interface (API's). As tecnologias de voz são divididas em duas categorias: reconhecimento de voz, que é utilizado em comandos por voz (converte a voz em texto), e sintetizador de voz, que é utilizado para melhorar a acessibilidade nos dispositivos (converte o texto em fala). Essas tecnologias fazem uso do Processamento de Linguagem Natural, subárea da Inteligência Artificial, para processar e manipular a linguagem humana em diversos níveis. Este artigo apresenta um levantamento das principais API's de reconhecimento e sintetização da voz, descrevendo as suas características e funcionalidades. Além disso, um estudo de caso mostra qual API foi escolhida dentre as que foram pesquisadas, e como a mesma foi implementada no aplicativo Alerta Brusque. ABSTRACTThe use of intelligent interfaces, usability features and voice technologies are enabling applications to become increasingly rich, especially to assist inexperienced users or those with special needs. Therewith, many software developers are looking for ways to implement voice technologies in their products, and one of the most commonly used forms is the Application Programming Interface (API). Voice technologies are divided into two categories: voice recognition, which is widely used in voice commands (converts voice to text), and speech synthesizer, which is widely used to improve accessibility in devices (convert text to speech). These voice technologies use Natural Language Processing techniques, subarea of Artificial Intelligence, in order to process and manipulate human language at several levels. This article presents an analysis of the main voice recognition and synthesizing APIs, describing their characteristics and functionalities. In addition, as a case study, it shows which API was chosen, among those that were researched, and how it was implemented in the Alert Brusque application.
Ensuring accessibility is a mandatory issue by law, both in everyday life and in the context of recreational activities. In electronic games this is no different, to ensure accessibility for people with disabilities (PwD), developers and research are constantly seeking to innovate in technical and methodological issues. This work will present an electronic game of the 2D platform genre for the computer, which offers the user the possibility of controlling through voice commands, generating a form of accessibility for motor PwD. During testing, the API achieved an average margin of error of 14.72% per reconition. Tests were carried out with 21 people without disabilities, who differed by their intimacy with digital games. Based on the results obtained, it was possible to verify that the players performance when using the standard control was superior to the performance using voice controls. This allowed us to analyze that, although voice control is an accessibility option, it is not efficient when compared to the standard options (mouse and keyboard), as the performance of users was superior.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.