Proceedings of the 2010 ACM Symposium on Applied Computing 2010
DOI: 10.1145/1774088.1774091
|View full text |Cite
|
Sign up to set email alerts
|

Enhancing document structure analysis using visual analytics

Abstract: During the last decade national archives, libraries, museums and companies started to make their records, books and files electronically available. In order to allow efficient access of this information, the content of the documents must be stored in database and information retrieval systems. State-of-the-art indexing techniques mostly rely on the information explicitly available in the text portions of documents. Documents usually contain a significant amount of implicit information such as their logical str… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
16
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(16 citation statements)
references
References 15 publications
0
16
0
Order By: Relevance
“…Andreas Stoffel, from the Department of Computer and Information Science, University of Konstanz, Germany, participated with a trainable system [9], [10] for the analysis of PDF documents based on the PDFBox library. After initial column and reading-order detection, logical classification is performed on the line level.…”
Section: E Stoffel's Systemmentioning
confidence: 99%
“…Andreas Stoffel, from the Department of Computer and Information Science, University of Konstanz, Germany, participated with a trainable system [9], [10] for the analysis of PDF documents based on the PDFBox library. After initial column and reading-order detection, logical classification is performed on the line level.…”
Section: E Stoffel's Systemmentioning
confidence: 99%
“…The Information retrieval system was integrated with a text editor in order to find similar documents. They analyzed the extraction of logical structure from different text document formats [4], [5] and also from source code documents. The second area was the extraction of semantically coherent blocks of text from documents [6].…”
Section: Related Workmentioning
confidence: 99%
“…Partitioning scholarly documents comes under a wide research problem know as logical structure extraction (LSE) of semistructured documents, and is not the focus of this work. Fortunately, there are efficient LSE solutions addressed in recent literature (Burget, 2007; Luong et al, 2010; Ratté et al, 2007; Stoffel et al, 2010). In this work, we employ Luong's LSE (Luong et al, 2010) developed by the National University of Singapore (NUS), and available for free use or adaptation within other tools under the Lesser GNU Public License (LGPL) 5…”
Section: Related Workmentioning
confidence: 99%
“…Structural components are subject to interpretation by the reader, but also can be identified automatically using LSE methods (Burget, 2007; Luong et al, 2010; Ratté et al, 2007; Stoffel et al, 2010). In this work, we use SectLabel tool (Luong et al, 2010) to extract the logical structure of scientific publications.…”
Section: Segmentation Of Scientific Publicationsmentioning
confidence: 99%
See 1 more Smart Citation