2020
DOI: 10.14209/jcis.2020.25
|View full text |Cite
|
Sign up to set email alerts
|

An open-source end-to-end ASR system for Brazilian Portuguese using DNNs built from newly assembled corpora

Abstract: In this work, we present a baseline end-to-end system based on deep learning for automatic speech recognition in Brazilian Portuguese. To build such a model, we employ a speech corpus containing 158 hours of annotated speech by assembling four individual datasets, three of them publicly available, and a text corpus containing 10.2 millions of sentences. We train an acoustic model based on the DeepSpeech 2 network, with two convolutional and five bidirectional recurrent layers. By adding a newly trained 15-gram… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
1
0
5

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 34 publications
0
1
0
5
Order By: Relevance
“…The model trained with only 1 real speaker in the target language (TTS dataset) with data augmentation using voice conversion and TTS, achieved a WER of 33.96% and 36.59%, respectively. For comparison in pt-BR, [29] used ca. 158 hrs of speech and a nonself-supervised model without an external LM and achieved a WER of 47.41% on the test set of BRSD v2 dataset.…”
Section: Human X Generated + 1 Real Speakermentioning
confidence: 99%
“…The model trained with only 1 real speaker in the target language (TTS dataset) with data augmentation using voice conversion and TTS, achieved a WER of 33.96% and 36.59%, respectively. For comparison in pt-BR, [29] used ca. 158 hrs of speech and a nonself-supervised model without an external LM and achieved a WER of 47.41% on the test set of BRSD v2 dataset.…”
Section: Human X Generated + 1 Real Speakermentioning
confidence: 99%
“…Naárea de ASRs abertos para o Português, os trabalhos de [Quintanilha 2017] e [Quintanilha et al 2020] podem ser destacados como avanços recentes importantes. [Quintanilha 2017] propôs a elaboração de um dataset em Português Brasileiro composto pela junção de vários outros conjuntos de dados disponíveis.…”
Section: Trabalhos Correlatosunclassified
“…O autor obteve um WER de 25,13% no conjunto de teste proposto, 11% maior que os sistemas comerciais comparados pelo trabalho. Mais recentemente, [Quintanilha et al 2020] propuseram uma versão melhorada do dataset utilizado em [Quintanilha 2017] ao adicionar o dataset CETUC [Alencar and Alcaim 2008], contendo aproximadamente 145 horas de fala, ao conjunto de dados do trabalho anterior, para o treinamento de uma topologia baseada no DeepSpeech 2. Os autores alcançaram um WER de 25,45% no conjunto de teste proposto.…”
Section: Trabalhos Correlatosunclassified
See 2 more Smart Citations