Proceedings of the 6th Workshop on Asian Translation 2019
DOI: 10.18653/v1/d19-5204
|View full text |Cite
|
Sign up to set email alerts
|

Designing the Business Conversation Corpus

Abstract: While the progress of machine translation of written text has come far in the past several years thanks to the increasing availability of parallel corpora and corpora-based training technologies, automatic translation of spoken text and dialogues remains challenging even for modern systems. In this paper, we aim to boost the machine translation quality of conversational texts by introducing a newly constructed Japanese-English business conversation parallel corpus. A detailed analysis of the corpus is provided… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(11 citation statements)
references
References 12 publications
0
11
0
Order By: Relevance
“…We used all the test sets in our previous work (Morishita et al, 2020), which included the Asian Scientific Paper Excerpt Corpus (AS-PEC) (Nakazawa et al, 2016), the Japanese-English Subtitle Corpus (JESC) (Pryzant et al, 2017), the Kyoto Free Translation Task (KFTT) (Neubig, 2011), and TED talks (tst2015) (Cettolo et al, 2012). We also evaluated our models on the Business Scene Dialogue Corpus (Rikters et al, 2019) to check whether they worked on conversations. We also added test sets from shared tasks: WMT 2020, 2021 news translation shared tasks (Barrault et al, 2020;Akhbardeh et al, 2021), WMT 2019, 2020 robustness shared tasks (Li et al, 2019;Specia et al, 2020), and the IWSLT 2021 simultaneous translation task (Anastasopoulos et al, 2021).…”
Section: Test Setsmentioning
confidence: 99%
“…We used all the test sets in our previous work (Morishita et al, 2020), which included the Asian Scientific Paper Excerpt Corpus (AS-PEC) (Nakazawa et al, 2016), the Japanese-English Subtitle Corpus (JESC) (Pryzant et al, 2017), the Kyoto Free Translation Task (KFTT) (Neubig, 2011), and TED talks (tst2015) (Cettolo et al, 2012). We also evaluated our models on the Business Scene Dialogue Corpus (Rikters et al, 2019) to check whether they worked on conversations. We also added test sets from shared tasks: WMT 2020, 2021 news translation shared tasks (Barrault et al, 2020;Akhbardeh et al, 2021), WMT 2019, 2020 robustness shared tasks (Li et al, 2019;Specia et al, 2020), and the IWSLT 2021 simultaneous translation task (Anastasopoulos et al, 2021).…”
Section: Test Setsmentioning
confidence: 99%
“…This task is running successfully in WAT since 2019 and attracted many teams working on multimodal machine translation and image captioning in Indian languages (Nakazawa et al, 2019.…”
Section: English→hindi Multi-modal Taskmentioning
confidence: 99%
“…For the analysis, we use the Business Scene Dialogue Corpus (Rikters et al, 2019), which is a Japanese and English parallel corpus in the conversational domain. Besides the published data, we also use the in-house version of the corpus, which amounts to a total of 104,961 sentence pairs.…”
Section: Is Local Context Useful For Predictingmentioning
confidence: 99%
See 1 more Smart Citation
“…Thus, a parallel document-and sentence-aligned conversation corpus would be advantageous in taking MT research in this field to the next stage. In this paper, we introduce a newly constructed Japanese-English conversation corpus that contains three sub-corpora: business scene dialogue (BSD) (Rikters et al 2019), Japanese translation of AMI meeting corpus (AMI 1 ) (Mc-Cowan et al 2005), and the Japanese translation of OntoNotes 5.0 (ON 2 ) (Weischedel et al 2011). The corpus contains multiperson conversations in various situations: business scenes, meetings under specific themes, broadcast conversations, and telephone conversations.…”
Section: Introductionmentioning
confidence: 99%