2009
DOI: 10.1007/978-3-642-04346-8_62
|View full text |Cite
|
Sign up to set email alerts
|

GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications

Abstract: Abstract. Based on state of the art machine learning techniques, GRO-BID (GeneRation Of BIbliographic Data) performs reliable bibliographic data extractions from scholar articles combined with multi-level term extractions. These two types of extraction present synergies and correspond to complementary descriptions of an article. This tool is viewed as a component for enhancing the existing and the future large repositories of technical and scientific publications. ObjectivesThe purpose of this demonstration is… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
141
0
1

Year Published

2011
2011
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 182 publications
(153 citation statements)
references
References 2 publications
0
141
0
1
Order By: Relevance
“…In a recent survey and evaluation, several non-commercial reference parsing tools, Tkaczyk et al (2018) found that the best three performing ones all use a CRF approach: GROBID (Lopez, 2009), CERMINE (Tkaczyk et al, 2015) and ParsCit (Councill et al, 2008). All three benefit from task-specific tuning using extra annotated data, with GROBID showing the best off-the-shelf results.…”
Section: Related Workmentioning
confidence: 99%
“…In a recent survey and evaluation, several non-commercial reference parsing tools, Tkaczyk et al (2018) found that the best three performing ones all use a CRF approach: GROBID (Lopez, 2009), CERMINE (Tkaczyk et al, 2015) and ParsCit (Councill et al, 2008). All three benefit from task-specific tuning using extra annotated data, with GROBID showing the best off-the-shelf results.…”
Section: Related Workmentioning
confidence: 99%
“…The system based on TeamBeam algorithm proposed by Kern et al [13] is able to extract a basic set of metadata from PDF documents using an enhanced Maximum Entropy classifier. Lopez [14] proposes GROBID system for analysing scientific texts in PDF format. GROBID uses CRF in order to extract document's metadata, full text and a list of parsed bibliographic references.…”
Section: State Of the Artmentioning
confidence: 99%
“…Reference sections are typically located in the documents using heuristics [6,7,16,17] or machine learning [14,18].…”
Section: State Of the Artmentioning
confidence: 99%
“…In essence, they require to be as neatly associated to the originated researcher or institution, as a warrant for the trustfulness of 20 See in particular the importance of machine learning techniques in this respect (Lopez 2009) the content, but also in order to allow an adequate citation of the work. More generally, scientific data have to be, even more than publications, associated with precise metadata (in the same way as what we have for publications with bibliographical data).…”
Section: Characterising Research Datamentioning
confidence: 99%