ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9053901
|View full text |Cite
|
Sign up to set email alerts
|

Instance-based Model Adaptation for Direct Speech Translation

Abstract: Despite recent technology advancements, the effectiveness of neural approaches to end-to-end speech-to-text translation is still limited by the paucity of publicly available training corpora. We tackle this limitation with a method to improve data exploitation and boost the system's performance at inference time. Our approach allows us to customize "on the fly" an existing model to each incoming translation request. At its core, it exploits an instance selection procedure to retrieve, from a given pool of data… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
3
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 23 publications
2
3
0
Order By: Relevance
“…The results are presented in Table 4. The performance of our character-level model is slightly worse but comparable with the results reported in (Di Gangi et al, 2020) and in (Nguyen et al, 2019) Unidirectional refers to one-to-one systems. The other results are computed with one multilingual system for En→De,NL and one for En→Es,Fr,It,Pt.…”
Section: Worksupporting
confidence: 86%
See 1 more Smart Citation
“…The results are presented in Table 4. The performance of our character-level model is slightly worse but comparable with the results reported in (Di Gangi et al, 2020) and in (Nguyen et al, 2019) Unidirectional refers to one-to-one systems. The other results are computed with one multilingual system for En→De,NL and one for En→Es,Fr,It,Pt.…”
Section: Worksupporting
confidence: 86%
“…The results on all the languages of MuST-C are presented in Table 3. Our characterlevel results are similar but not identical to the ones presented in (Di Gangi et al, 2020). Our BPE-level results outperform the ones at character level by at least 1.2 BLEU point on En-Ru and up to 3.3 points on En-Fr, with improvements of about 2 points in most of the languages.…”
Section: Worksupporting
confidence: 66%
“…On one hand, audio source avoids the error propagation and exposure bias introduced by using as context the translations generated at inference time. On the other, these problems are balanced by the easiness of extracting information from text rather than from audio [12]. In this work, we study both options.…”
Section: Context-aware Stmentioning
confidence: 99%
“…When we use the generated translations as context, its tokens are converted into vectors with word embeddings (namely, we re-use the decoder embeddings), summed with positional encoding and then provided to the encoder Transformer layers. When we use the audio as context, the input audio features are first processed by the encoder of the base model and then passed to the context encoder [12]. Sequential (Figure 1).…”
Section: Context-aware Stmentioning
confidence: 99%
“…3 The instancebased method is used to slow the error by weighting the source samples and train the weighted source samples. 4 The feature-based methods usually transform the features of the source and the target domains into a shared space where the feature distributions of the two data sets match. The domain adaptive method, based on feature representation, is the most commonly used method.…”
Section: Introductionmentioning
confidence: 99%