2022
DOI: 10.1007/s10579-022-09621-4
|View full text |Cite
|
Sign up to set email alerts
|

CORAA ASR: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese

Abstract: Automatic Speech recognition (ASR) is a complex and challenging task. In recent years, there have been significant advances in the area. In particular, for the Brazilian Portuguese (BP) language, there were around 376 h publicly available for the ASR task until the second half of 2020. With the release of new datasets in early 2021, this number increased to 574 h. The existing resources, however, are composed of audios containing only read and prepared speech. There is a lack of datasets including spontaneous … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 24 publications
0
5
0
1
Order By: Relevance
“…O dataset utilizado no refinamento, chamado CORAA (Corpus of Annotated Audios) v1, é composto por 290,77 horas de áudios em português brasileiro. O refinamento do modelo, realizado por [Junior et al 2021], se deu com a GPU NVIDIA TESLA V100 32GB, usando um tamanho de batch de 8 e acumulação de gradiente sobre 24 etapas. Para auxiliar na transcrição, construiu-se um modelo de língua, que é um arquivo auxiliar que permite o ajuste de pequenas variações na transcrição segundo um dicionário de possíveis combinações de palavras.…”
Section: Id Do áUdiounclassified
“…O dataset utilizado no refinamento, chamado CORAA (Corpus of Annotated Audios) v1, é composto por 290,77 horas de áudios em português brasileiro. O refinamento do modelo, realizado por [Junior et al 2021], se deu com a GPU NVIDIA TESLA V100 32GB, usando um tamanho de batch de 8 e acumulação de gradiente sobre 24 etapas. Para auxiliar na transcrição, construiu-se um modelo de língua, que é um arquivo auxiliar que permite o ajuste de pequenas variações na transcrição segundo um dicionário de possíveis combinações de palavras.…”
Section: Id Do áUdiounclassified
“…Our methods tested three open-source ASR deep learning fine-tuning pretrained models: Wav2Vec 2.0, HuBert, and WavLM [13][14][15]. Notably, these models are trained on large publicly available English language datasets and there is a lack of similar datasets in Brazilian Portuguese [16]. These existing models were fine-tuned to work with Brazilian Portuguese emergency calls by using the original hyperparameters from the pretrained models.…”
Section: Asr Model For Brazilian Portuguesementioning
confidence: 99%
“…Existing resources for Portuguese are composed of audios containing only read and prepared speeches, and there is a lack of datasets that include spontaneous speeches, essential in different applications. An exception is a new dataset in Portuguese designated as CORAA (Junior et al, 2021) that is composed of five different of European and Brazilian Portuguese conversations. They tried to bridge the gap of lack of spontaneity and formal speech by having only real conversations.…”
Section: Datasetsmentioning
confidence: 99%