Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - ACL '03 2003
DOI: 10.3115/1075096.1075106
|View full text |Cite
|
Sign up to set email alerts
|

Reliable measures for aligning Japanese-English news articles and sentences

Abstract: We have aligned Japanese and English news articles and sentences to make a large parallel corpus. We first used a method based on cross-language information retrieval (CLIR) to align the Japanese and English articles and then used a method based on dynamic programming (DP) matching to align the Japanese and English sentences in these articles. However, the results included many incorrect alignments. To remove these, we propose two measures (scores) that evaluate the validity of alignments. The measure for arti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
71
0

Year Published

2008
2008
2023
2023

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 105 publications
(71 citation statements)
references
References 7 publications
0
71
0
Order By: Relevance
“…They report improvements in word alignments using parallel sentences discovered by their method. Utiyama and Isahara (2003) use CLIR techniques and dynamic programming (DP) to extract sentences from an English-Japanese comparable news corpus. They identify similar article pairs, and then, treating these pairs as parallel texts, align their sentences on a sentence pair similarity score and use DP to find the least-cost alignment over the document pair.…”
Section: Related Workmentioning
confidence: 99%
“…They report improvements in word alignments using parallel sentences discovered by their method. Utiyama and Isahara (2003) use CLIR techniques and dynamic programming (DP) to extract sentences from an English-Japanese comparable news corpus. They identify similar article pairs, and then, treating these pairs as parallel texts, align their sentences on a sentence pair similarity score and use DP to find the least-cost alignment over the document pair.…”
Section: Related Workmentioning
confidence: 99%
“…The students in the DDL class worked in pairs to explore targeted recurring grammatical structures in a bilingual newspaper corpus (Utiyama & Isahara, 2003), using a parallel concordancer (Paraconc, 2004). Following guidelines on a worksheet, they searched for and examined basic grammatical structures, formed hypotheses about the structures, discussed these with their partners, and recorded their findings.…”
Section: The 2008 Studymentioning
confidence: 99%
“…A bottleneck in statistical machine translation is the scarceness of parallel resources for many language pairs and domains. Previous research has shown that this bottleneck can be reduced by utilizing parallel portions found within comparable corpora (Utiyama and Isahara, 2003;Munteanu et al, 2004;AbdulRauf and Schwenk, 2009). These are useful for many purposes, including automatic terminology extraction and the training of statistical MT systems.…”
Section: Introductionmentioning
confidence: 99%