Software Engineering, Testing, and Quality Assurance for Natural Language Processing on - SETQA-NLP '08 2008
DOI: 10.3115/1622110.1622119
|View full text |Cite
|
Sign up to set email alerts
|

Parallel implementations of word alignment tool

Abstract: Training word alignment models on large corpora is a very time-consuming processes. This paper describes two parallel implementations of GIZA++ that accelerate this word alignment process. One of the implementations runs on computer clusters, the other runs on multi-processor system using multi-threading technology. Results show a near-linear speedup according to the number of CPUs used, and alignment quality is preserved.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
138
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 212 publications
(138 citation statements)
references
References 8 publications
0
138
0
Order By: Relevance
“…In fact, several recent articles have reported on reproducibility and/or replication problems in the HLT field (e.g., Johnson et al 2007;Poprat et al 2008;Gao and Vogel 2008;Caporaso et al 2008;Kano et al 2009;Fokkens et al 2013;Hagen et al 2015), and two recent workshops 1 have addressed the need for replication and reproduction of HLT results. However, there is no established venue for publications on the topic, and perhaps more problematically, research that investigates existing methods rather than introducing new ones is often implicitly discouraged in the process of peer review.…”
mentioning
confidence: 99%
“…In fact, several recent articles have reported on reproducibility and/or replication problems in the HLT field (e.g., Johnson et al 2007;Poprat et al 2008;Gao and Vogel 2008;Caporaso et al 2008;Kano et al 2009;Fokkens et al 2013;Hagen et al 2015), and two recent workshops 1 have addressed the need for replication and reproduction of HLT results. However, there is no established venue for publications on the topic, and perhaps more problematically, research that investigates existing methods rather than introducing new ones is often implicitly discouraged in the process of peer review.…”
mentioning
confidence: 99%
“…We again compare the benchmark with sets of automatically reordered Chinese sentences generated the same way as in the first scenario. Word alignments between Chinese and Japanese are produced by MGIZA++ [97] in a file named ch-ja.A3.final. In this file, parallel sentence pairs (Chinese and Japanese) are aligned to each other as follows:…”
Section: Discussionmentioning
confidence: 99%
“…The standard Moses 8 [96] baseline was used, where reordered Chinese sentences were paired with their Japanese counterparts and word-to-word alignments were estimated by using MGIZA++ 9 [9,97].…”
Section: Et Ceteramentioning
confidence: 99%
“…First, each entry is segmented with the BPE rules available along with the pre-trained Nematus model. Then, the segmented entries are aligned by running MGiza++ (Gao and Vogel, 2008) trained on the BPE-level WMT'16 training data. Finally, all the one-to-one aligned sub-units are extracted to form the sub-word level bilingual term dictionaries.…”
Section: Experimental Settingmentioning
confidence: 99%