2007
DOI: 10.1007/s10590-008-9040-7
|View full text |Cite
|
Sign up to set email alerts
|

Automatic extraction of translations from web-based bilingual materials

Abstract: Abstract. This paper describes the framework of the StatCan Daily Translation Extraction System (SDTES), a computer system that maps and compares webbased translation texts of Statistics Canada (StatCan) news releases in the StatCan publication The Daily. The goal is to extract translations for translation memory systems, for translation terminology building, for cross-language information retrieval and for corpus-based machine translation systems. Three years of officially published statistical news release t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
1
0

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 19 publications
0
1
0
Order By: Relevance
“…The collections will grow and expand in the future. In addition, it also includes a large collection of officially released Canada Hansard materials from the 35 th Parliament in 1994 to the 40 th Parliament at the end of 2008.All the text segments in the Hansard collection are aligned and filtered using the algorithms that are extended from those described in our previous work(Zhu et al 2007). When we search for translations of frequently used words (stop words not included), returned hits of aligned translation pairs can easily exceed the 10,000 mark.…”
mentioning
confidence: 99%
“…The collections will grow and expand in the future. In addition, it also includes a large collection of officially released Canada Hansard materials from the 35 th Parliament in 1994 to the 40 th Parliament at the end of 2008.All the text segments in the Hansard collection are aligned and filtered using the algorithms that are extended from those described in our previous work(Zhu et al 2007). When we search for translations of frequently used words (stop words not included), returned hits of aligned translation pairs can easily exceed the 10,000 mark.…”
mentioning
confidence: 99%