2020
DOI: 10.48550/arxiv.2010.05171
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 39 publications
(19 citation statements)
references
References 30 publications
0
19
0
Order By: Relevance
“…We compare our systems with the Speech-to-Text Transformer model available in Fairseq [20], to evaluate the performance of our systems with respect to a baseline. In particular, we use the small architecture, which is the one with reported results 1 .…”
Section: Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…We compare our systems with the Speech-to-Text Transformer model available in Fairseq [20], to evaluate the performance of our systems with respect to a baseline. In particular, we use the small architecture, which is the one with reported results 1 .…”
Section: Methodsmentioning
confidence: 99%
“…These kinds of sequences are about an order of magnitude longer than text inputs, therefore, the computational cost of training the model can rise critically. Hence, a common approach in ST systems is to add convolutional layers before the Transformer encoder that reduce the input sequence length [20]. Other systems also include 2D self-attention layers and a distance penalty in the attention, to bias it towards the local context [12].…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations