Proceedings of the 10th Annual Joint Conference on Digital Libraries 2010
DOI: 10.1145/1816123.1816126
|View full text |Cite
|
Sign up to set email alerts
|

Transferring structural markup across translations using multilingual alignment and projection

Abstract: We present here a method for automatically projecting structural information across translations, including canonical citation structure (such as chapters and sections), speaker information, quotations, markup for people and places, and any other element in TEI-compliant XML that delimits spans of text that are linguistically symmetrical in two languages. We evaluate this technique on two datasets, one containing perfectly transcribed texts and one containing errorful OCR, and achieve an accuracy rate of 88.2%… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2011
2011
2020
2020

Publication Types

Select...
4
3

Relationship

3
4

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 32 publications
0
7
0
Order By: Relevance
“…Projection has also been used by Bentivogli and Pianta (2005) to create a parallel version of an existing corpus. The projection of structural information between parallel documents is tackled by Bamman et al (2010), where alignment is performed firstly sentence-wise (1-1) and then word-wise.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Projection has also been used by Bentivogli and Pianta (2005) to create a parallel version of an existing corpus. The projection of structural information between parallel documents is tackled by Bamman et al (2010), where alignment is performed firstly sentence-wise (1-1) and then word-wise.…”
Section: Related Workmentioning
confidence: 99%
“…The works of Das andPetrov (2011), Moore (2002), Quan et al (2018), andZamani et al (2016) are all based on symmetric corpora, while imposing a 1-1 alignment is not possible in our domain. Bamman et al (2010), Bentivogli and Pianta (2005), Eger et al (2018), Fossum and Abney (2005), and Yarowsky et al (2001) all address word-level alignment instead of sentence level-alignment. These methods cannot be extended to address sentence-level alignment without introducing an element of subjectivity, for instance, in resolving conflicts of words matched across different sentences.…”
Section: Related Workmentioning
confidence: 99%
“…Work towards generating computational hypotheses for the types of annotation listed above relies on algorithms developed by Saeed Majidi at Tufts (Majidi and Crane, 2013); while development by David Bamman at Carnegie Mellon University informs work towards the automatic markup on documents through aligned translations (Bamman et. al, 2010).…”
Section: Technical Infrastructure 33mentioning
confidence: 99%
“…The size of this collection has made it a natural target for topic modeling [27] (including evaluating topic coherence [30]), where the immensity of the data encourages automatic methods for characterizing it, as well as research into automatic methods for adding structure [3]. More recently, however, researchers have begun to exploit these massive datasets for measuring historical change.…”
Section: Related Workmentioning
confidence: 99%