This study constructed an error-tagged evaluation corpus for Japanese grammatical error correction (GEC). Evaluation corpora are essential for assessing the performance of models. The availability of various evaluation corpora for English GEC has facilitated a comprehensive comparison between models and the development of the English GEC community. However, the development of the Japanese GEC community has been hindered due to the lack of available evaluation corpora in the Japanese GEC. As a result, we constructed a new evaluation corpus for the Japanese GEC and made it available to the public. We used texts written by the Japanese language learners in the Lang-8 corpus, a representative learner corpus in GEC, to create the
We introduce our TMU Japanese-to-English system, which employs a semi-autoregressive model, to tackle the WAT 2021 (Nakazawa et al., 2021) restricted translation task. In this task, we translate an input sentence with the constraint that some words, called restricted target vocabularies (RTVs), must be contained in the output sentence. To satisfy this constraint, we use a semi-autoregressive model, namely, RecoverSAT (Ran et al., 2020), due to its ability (known as "forced translation") to insert specified words into the output sentence. When using "forced translation," the order of inserting RTVs is a critical problem. In our system, we obtain word alignment between a source sentence and the corresponding RTVs and then sort the RTVs in the order of their corresponding words or phrases in the source sentence. Using the model with sorted order RTVs, we succeeded in inserting all the RTVs into output sentences in more than 96% of the test sentences. Moreover, we confirmed that sorting RTVs improved the BLEU score compared with random order RTVs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.