Proceedings of the 2005 ACM Symposium on Document Engineering 2005
DOI: 10.1145/1096601.1096638
|View full text |Cite
|
Sign up to set email alerts
|

Classifying XML tags through "reading contexts"

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2012
2012
2017
2017

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 1 publication
0
10
0
Order By: Relevance
“…From a more syntactical point of view, Tannier et al [38] associate each (XML) element in a document with one of three different categories: hard elementselements that are commonly used to structure the document content in different blocks and usually interrupt the linearity of a text, such as paragraphs and sections; soft elements -elements that identify significant text fragments and are transparent while reading the text, such as emphasis and links; and jump elements -elements that are logically detached from the surrounding text, and that give access to related information, such as footnotes and comments.…”
Section: Existing Models Describing Document Componentsmentioning
confidence: 99%
See 2 more Smart Citations
“…From a more syntactical point of view, Tannier et al [38] associate each (XML) element in a document with one of three different categories: hard elementselements that are commonly used to structure the document content in different blocks and usually interrupt the linearity of a text, such as paragraphs and sections; soft elements -elements that identify significant text fragments and are transparent while reading the text, such as emphasis and links; and jump elements -elements that are logically detached from the surrounding text, and that give access to related information, such as footnotes and comments.…”
Section: Existing Models Describing Document Componentsmentioning
confidence: 99%
“…We performed a preliminary test (fully described in [15]) on a dataset consisting of 117 scientific papers encoded in DocBook and published between 2008 and 2011 in the Balisage Series Conferences 38 . The documents vary a lot in their internal structure and size: from 3 Kbytes to 160 Kbytes, with an average size of about 60 Kbytes.…”
Section: Retrieving Structures From Xml Sourcesmentioning
confidence: 99%
See 1 more Smart Citation
“…Some literature has recently come out about the characterization and identification of structural patterns of text documents. For instance, Tannier, Girardot, and Mathieu (), starting from previous works by Lini, Lombardini, Paoli, Colazzo, and Sartiani () and Colazzo et al. (), describe an algorithm to assign each XML element in a document to one of three different categories: hard tag, soft tag , and jump tag .…”
Section: Structural Patternsmentioning
confidence: 99%
“…Tannier et al. () also introduce algorithms to assign XML elements to these categories by means of natural language processing (NLP) tools. This classification is rather interesting, in that it provides a justification for the identification of the classes, but it is a little coarse for our purposes, ignoring empty elements and failing to distinguish higher level and lower level hard tags (i.e., those containing other tags but not text from those that never contain text).…”
Section: Structural Patternsmentioning
confidence: 99%