“…We used all the test sets in our previous work (Morishita et al, 2020), which included the Asian Scientific Paper Excerpt Corpus (AS-PEC) (Nakazawa et al, 2016), the Japanese-English Subtitle Corpus (JESC) (Pryzant et al, 2017), the Kyoto Free Translation Task (KFTT) (Neubig, 2011), and TED talks (tst2015) (Cettolo et al, 2012). We also evaluated our models on the Business Scene Dialogue Corpus (Rikters et al, 2019) to check whether they worked on conversations. We also added test sets from shared tasks: WMT 2020, 2021 news translation shared tasks (Barrault et al, 2020;Akhbardeh et al, 2021), WMT 2019, 2020 robustness shared tasks (Li et al, 2019;Specia et al, 2020), and the IWSLT 2021 simultaneous translation task (Anastasopoulos et al, 2021).…”