2005
DOI: 10.1007/11551362_9
|View full text |Cite
|
Sign up to set email alerts
|

From Legacy Documents to XML: A Conversion Framework

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2006
2006
2014
2014

Publication Types

Select...
5

Relationship

3
2

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 4 publications
0
8
0
Order By: Relevance
“…Several investigators have previously identified the hierarchical tree as the appropriate logical model for tables of contents [4,6,5], especially in regard to the goal of mapping to XML output. Here, instead of propelling directly to one or another parsing algorithm, we expand on the functional roles comprising such a tree, and their mappings to layout and attribute cues.…”
Section: A Logical Representation Schemementioning
confidence: 99%
See 1 more Smart Citation
“…Several investigators have previously identified the hierarchical tree as the appropriate logical model for tables of contents [4,6,5], especially in regard to the goal of mapping to XML output. Here, instead of propelling directly to one or another parsing algorithm, we expand on the functional roles comprising such a tree, and their mappings to layout and attribute cues.…”
Section: A Logical Representation Schemementioning
confidence: 99%
“…For example, parsing TOCs of published material provides useful cross-linking of content headings and content location during paper to digital conversion. Some approaches exploit this constraint by devising TOC parsing algorithms based on matching items in the TOC with chapter and section headings in the body of the book or article [5,6,8].…”
Section: Introductionmentioning
confidence: 99%
“…ALDAI offers an extended mechanism for the feature definition and management. By default, it offers a basic set of features extracted from layout-oriented documents [1]. In addition, it offers a high level script language in which to define new features or to customize existing features.…”
Section: Active Learning Componentmentioning
confidence: 99%
“…At Xerox Research Centre Europe, we are undertaking the Legacy Document Conversion project [1] which is aimed at automating different subtasks of mass document conversion to XML. The project Copyright is held by the author/owner.…”
Section: Introductionmentioning
confidence: 99%
“…To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. At Xerox Research Centre Europe, we are undertaking the Legacy Document Conversion (LegDoC) project [2] which is aimed at automating different subtasks of overall mass document conversion to XML. A typical conversion task starts with a large collection of documents in PDF or HTML.…”
Section: Introductionmentioning
confidence: 99%