2017
DOI: 10.7287/peerj.preprints.2993
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

bioPDFX: preparing PDF scientific articles for biomedical text mining

Abstract: Background. There is huge amount of full-text biomedical literatures available in public repositories like PubMed Central (PMC). However, a substantial number of the papers are in Portable Document Format (PDF) and do not provide plain text format ready for text mining and natural language processing (NLP). Although there exist many PDF-to-text converters, they still suffer from several challenges while processing biomedical PDFs, such as the correct transcription of titles/abstracts, segmenting references/ack… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 12 publications
0
1
0
Order By: Relevance
“…For other papers, we collected 965 full texts in PDF. These PDF files were all transcribed into XML using bioPDFX [ 46 ], a tool that we built on top of PDFX [ 47 ], to prepare PDF papers for biomedical text mining.…”
Section: Methodsmentioning
confidence: 99%
“…For other papers, we collected 965 full texts in PDF. These PDF files were all transcribed into XML using bioPDFX [ 46 ], a tool that we built on top of PDFX [ 47 ], to prepare PDF papers for biomedical text mining.…”
Section: Methodsmentioning
confidence: 99%