2003
DOI: 10.1117/12.528808
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical logical structure extraction of book documents by analyzing tables of contents

Abstract: Logical structure extraction of book documents is significant in electronic document database automatic construction.The tables of contents in a book play an important role in representing the overall logical structure and reference information of the book documents. In this paper, a new method is proposed to extract the hierarchical logical structure of book documents, in addition to the reference information, by combining spatial and semantic information of the tables of contents in a book. Experimental resu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2005
2005
2015
2015

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(12 citation statements)
references
References 2 publications
0
12
0
Order By: Relevance
“…This is the fact that previous researches mostly depend on. Our method is similar to that presented by Feng et al in 5 , but we only extract the chapter structure instead of the whole hierarchical logical structure.…”
Section: Methods 4: Text Matchingmentioning
confidence: 92%
See 1 more Smart Citation
“…This is the fact that previous researches mostly depend on. Our method is similar to that presented by Feng et al in 5 , but we only extract the chapter structure instead of the whole hierarchical logical structure.…”
Section: Methods 4: Text Matchingmentioning
confidence: 92%
“…Feng et al 5 exploited the indentation, page numbers and numbering scheme to compute the logical structure of a book. Belaïd et al 6 proposed a labeling approach to recognize the TOC of scientific journal in the Calliope electronic library, extracted the page numbers from the TOC and used them to find the starting page of each article.…”
Section: Introductionmentioning
confidence: 99%
“…The structures of page number, header, footer, headline, figure and body text are analyzed and matched with information on the contents pages to reconstruct the links between ToC and body text. He et al [17] propose a method to extract the hierarchical logical structure of book documents, along with the reference information, by combining the spatial and the semantic information of ToC in a book.…”
Section: Related Workmentioning
confidence: 99%
“…Lin et al [4] introduced a system of TOC page analysis using layout modeling and headline matching, and acquired the logical structure of the TOC through in-depth analysis of its numbering scheme. He et al [5] combined geometrical rules (indentations) and semantic rules (typical text sequences identifying chapters and sections) to extract the hierarchical structure in Chinese books.…”
Section: Related Workmentioning
confidence: 99%