2019
DOI: 10.1145/3314936
|View full text |Cite
|
Sign up to set email alerts
|

Leveraging Additional Resources for Improving Statistical Machine Translation on Asian Low-Resource Languages

Abstract: Phrase-based machine translation (MT) systems require large bilingual corpora for training. Nevertheless, such large bilingual corpora are unavailable for most language pairs in the world, causing a bottleneck for the development of MT. For the Asian language pairs—Japanese, Indonesian, Malay paired with Vietnamese—they are also not excluded from the case, in which there are no large bilingual corpora on these low-resource language pairs. Furthermore, although the languages are widely used in the world, there … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 31 publications
0
4
0
Order By: Relevance
“…This table examines the model's ability to deal with complex language structures through specific sentence examples. Among them, the Multi-Perspective Training Model has the most accurate translation quality and understanding of language structures when facing complex clause structures and culturally specific vocabulary, which proves that Multi-Perspective Training is of great help in improving such translation problems [34,35]. Fig.…”
Section: E Experimental Results and Tablesmentioning
confidence: 97%
“…This table examines the model's ability to deal with complex language structures through specific sentence examples. Among them, the Multi-Perspective Training Model has the most accurate translation quality and understanding of language structures when facing complex clause structures and culturally specific vocabulary, which proves that Multi-Perspective Training is of great help in improving such translation problems [34,35]. Fig.…”
Section: E Experimental Results and Tablesmentioning
confidence: 97%
“…this method employs multinational BERT to construct source and the destination sentence word embedding for nearest-neighbour searching, as well as self-training to adjust the models. On the BUCC 2017 bitext mining problem, we verify our strategy by recovering concurrent subsequent uses and comparing it to earlier unlabelled data [39,41].…”
Section: Related Workmentioning
confidence: 99%
“…Data augmentation. The size of data extracted from books is usually insufficient to reach a promising number of training samples, and it may result that the deep learning-based models cannot achieve a satisfactory prediction result [36]. In this work, the speaker corpus of Records collected only 1,248 items, and a measly portion of 806 samples (64.5%) are labeled after transcribing the entire text.…”
Section: Speakers Extractionmentioning
confidence: 99%