CORAA ASR: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese

Cândido, Arnaldo; Casanova, Edresson; Soares, Anderson da Silva; Oliveira, Frederico Santos de; Oliveira, Lucas Ferro Antunes de; Fernandes, Ricardo Corso; Silva, Daniel Peixoto Pinto da; Fayet, Fernando Gorgulho; Carlotto, Bruno Baldissera; Gris, Lucas Rafael Stefanel; Aluísio, Sandra Maria

doi:10.1007/s10579-022-09621-4

Cited by 6 publications

(6 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…O dataset utilizado no refinamento, chamado CORAA (Corpus of Annotated Audios) v1, é composto por 290,77 horas de áudios em português brasileiro. O refinamento do modelo, realizado por [Junior et al 2021], se deu com a GPU NVIDIA TESLA V100 32GB, usando um tamanho de batch de 8 e acumulação de gradiente sobre 24 etapas. Para auxiliar na transcrição, construiu-se um modelo de língua, que é um arquivo auxiliar que permite o ajuste de pequenas variações na transcrição segundo um dicionário de possíveis combinações de palavras.…”

Section: Id Do áUdiounclassified

Identificação de silabação em áudios de leitura de crianças em anos iniciais

Jucá,

Rocha,

Mello

et al. 2023

Anais Do XXXIV Simpósio Brasileiro De Informática Na Educação (SBIE 2023)

View full text Add to dashboard Cite

Este artigo aborda a detecção automática de silabação em áudios de crianças em fase de alfabetização, que é um desafio da avaliação de fluência em leitura. Nesse contexto, o reconhecimento automático da fala (ASR) permite processar os áudios de forma rápida e objetiva, gerando métricas acústicas sobre a duração das sílabas e a duração dos intervalos entre elas. Assim, propõe-se neste trabalho a aplicação de heurísticas que usam essas características para classificar automaticamente a silabação. Os resultados obtidos alcançaram acurácia de 0,87 em uma base de validação, o que destaca que a classificação automática da silabação pode ser aplicada na avaliação de fluência.

show abstract

Section: Id Do áUdiounclassified

Identificação de silabação em áudios de leitura de crianças em anos iniciais

Jucá,

Rocha,

Mello

et al. 2023

Anais Do XXXIV Simpósio Brasileiro De Informática Na Educação (SBIE 2023)

View full text Add to dashboard Cite

show abstract

“…Our methods tested three open-source ASR deep learning fine-tuning pretrained models: Wav2Vec 2.0, HuBert, and WavLM [13][14][15]. Notably, these models are trained on large publicly available English language datasets and there is a lack of similar datasets in Brazilian Portuguese [16]. These existing models were fine-tuned to work with Brazilian Portuguese emergency calls by using the original hyperparameters from the pretrained models.…”

Section: Asr Model For Brazilian Portuguesementioning

confidence: 99%

AI-based approach for transcribing and classifying unstructured emergency call data: A methodological proposal

Costa,

Pinna,

Joiner

et al. 2023

PLOS Digit Health

View full text Add to dashboard Cite

Emergency care-sensitive conditions (ECSCs) require rapid identification and treatment and are responsible for over half of all deaths worldwide. Prehospital emergency care (PEC) can provide rapid treatment and access to definitive care for many ECSCs and can reduce mortality in several different settings. The objective of this study is to propose a method for using artificial intelligence (AI) and machine learning (ML) to transcribe audio, extract, and classify unstructured emergency call data in the Serviço de Atendimento Móvel de Urgência (SAMU) system in southern Brazil. The study used all “1-9-2” calls received in 2019 by the SAMU Novo Norte Emergency Regulation Center (ERC) call center in Maringá, in the Brazilian state of Paraná. The calls were processed through a pipeline using machine learning algorithms, including Automatic Speech Recognition (ASR) models for transcription of audio calls in Portuguese, and a Natural Language Understanding (NLU) classification model. The pipeline was trained and validated using a dataset of labeled calls, which were manually classified by medical students using LabelStudio. The results showed that the AI model was able to accurately transcribe the audio with a Word Error Rate of 42.12% using Wav2Vec 2.0 for ASR transcription of audio calls in Portuguese. Additionally, the NLU classification model had an accuracy of 73.9% in classifying the calls into different categories in a validation subset. The study found that using AI to categorize emergency calls in low- and middle-income countries is largely unexplored, and the applicability of conventional open-source ML models trained on English language datasets is unclear for non-English speaking countries. The study concludes that AI can be used to transcribe audio and extract and classify unstructured emergency call data in an emergency system in southern Brazil as an initial step towards developing a decision-making support tool.

show abstract

“…Existing resources for Portuguese are composed of audios containing only read and prepared speeches, and there is a lack of datasets that include spontaneous speeches, essential in different applications. An exception is a new dataset in Portuguese designated as CORAA (Junior et al, 2021) that is composed of five different of European and Brazilian Portuguese conversations. They tried to bridge the gap of lack of spontaneity and formal speech by having only real conversations.…”

Section: Datasetsmentioning

confidence: 99%

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

2023

View full text Add to dashboard Cite

Although the curse of multilinguality significantly restricts the language abilities of multilingual models in monolingual settings, researchers now still have to rely on multilingual models to develop state-of-the-art systems in Vietnamese Machine Reading Comprehension. This difficulty in researching is because of the limited number of high-quality works in developing Vietnamese language models. In order to encourage more work in this research field, we present a comprehensive analysis of language weaknesses and strengths of current Vietnamese monolingual models using the downstream task of Machine Reading Comprehension. From the analysis results, we suggest new directions for developing Vietnamese language models. Besides this main contribution, we also successfully reveal the existence of artifacts in Vietnamese Machine Reading Comprehension benchmarks and suggest an urgent need for new high-quality benchmarks to track the progress of Vietnamese Machine Reading Comprehension. Moreover, we also introduced a minor but valuable modification to the process of annotating unanswerable questions for Machine Reading Comprehension from previous work. Our proposed modification helps improve the quality of unanswerable questions to a higher level of difficulty for Machine Reading Comprehension systems to solve.

show abstract

CORAA ASR: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese

Cited by 6 publications

References 24 publications

Identificação de silabação em áudios de leitura de crianças em anos iniciais

Identificação de silabação em áudios de leitura de crianças em anos iniciais

AI-based approach for transcribing and classifying unstructured emergency call data: A methodological proposal

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

Contact Info

Product

Resources

About