2011 IEEE International Conference on Bioinformatics and Biomedicine 2011
DOI: 10.1109/bibm.2011.26
|View full text |Cite
|
Sign up to set email alerts
|

An Automatic System for Extracting Figures and Captions in Biomedical PDF Documents

Abstract: Figures in biomedical articles often constitute direct evidence of experimental results. Image analysis methods can be coupled with text-based methods to improve knowledge discovery. However, automatically harvesting figures along with their associated captions from full-text articles remains challenging. In this paper, we present an automatic system for robustly harvesting figures from biomedical literature. Our approach relies on the idea that the PDF specification of the document layout can be used to ident… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2012
2012
2024
2024

Publication Types

Select...
7
1
1

Relationship

3
6

Authors

Journals

citations
Cited by 14 publications
(11 citation statements)
references
References 11 publications
0
11
0
Order By: Relevance
“…Our PDFBox based extractor (E pdbx ) and 2. Xpdf based extractor (E xpdf ) reported in [4]. Results in table 1 shows that our system performs well in precision and recall.…”
Section: Experiments and Resultsmentioning
confidence: 51%
See 1 more Smart Citation
“…Our PDFBox based extractor (E pdbx ) and 2. Xpdf based extractor (E xpdf ) reported in [4]. Results in table 1 shows that our system performs well in precision and recall.…”
Section: Experiments and Resultsmentioning
confidence: 51%
“…Recent work [4] describes a methodology for extraction of images and captions from PDF files, whereby images are extracted from PDF using Xpdf 5 and captions are extracted using regular expressions and heuristics. We use regular expressions and document layout information for the same task (section 4).…”
Section: Related Workmentioning
confidence: 99%
“…Our approach builds upon a new document parsing system [17] that can automatically extract figure-caption pairs from PDF articles. This allows us to efficiently recover figures and then associate them with the corresponding captions.…”
Section: Document Processingmentioning
confidence: 99%
“…Specific approaches aiming to extract figures and captions from PDF documents have been recently proposed. Lopez et al [8] and Choudhury et al [9] introduced methods based on available tools (Xpdf and PDFBox respectively), but neither method handles vector graphics within scientific publications.…”
Section: Introductionmentioning
confidence: 99%