Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries 2018
DOI: 10.1145/3197026.3197048
|View full text |Cite
|
Sign up to set email alerts
|

Machine Learning vs. Rules and Out-of-the-Box vs. Retrained

Abstract: Bibliographic reference parsing refers to extracting machinereadable metadata, such as the names of the authors, the title, or journal name, from bibliographic reference strings. Many approaches to this problem have been proposed so far, including regular expressions, knowledge bases and supervised machine learning. Many open source reference parsers based on various algorithms are also available. In this paper, we apply, evaluate and compare ten reference parsing tools in a specific business use case. The too… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
10
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 34 publications
(11 citation statements)
references
References 33 publications
0
10
0
Order By: Relevance
“…Ahmad and Afzal (2018) evaluate GROBID for de- tecting inline citations using a corpus of 5k Cite-Seer papers, and found GROBID to have an F1score of 0.89 on this task. Tkaczyk et al (2018) report GROBID as the best among 10 out-of-the-box tools for parsing bibliographies, also achieving an F1 of 0.89 in an evaluation corpus of 9.5k papers.…”
Section: Discussionmentioning
confidence: 95%
“…Ahmad and Afzal (2018) evaluate GROBID for de- tecting inline citations using a corpus of 5k Cite-Seer papers, and found GROBID to have an F1score of 0.89 on this task. Tkaczyk et al (2018) report GROBID as the best among 10 out-of-the-box tools for parsing bibliographies, also achieving an F1 of 0.89 in an evaluation corpus of 9.5k papers.…”
Section: Discussionmentioning
confidence: 95%
“…Popular approaches include regular expressions, knowledge bases, supervised machine learning, and hybrid approaches. Regular expressions are usually combined with additional approaches, for example with knowledge bases such as thesauri or ontologies, however in such approaches the system must first be filled with available knowledge [13]. Recently in 2019, an unsupervised rule-based approach was proposed that identifies units in source data and provides a corresponding semantic representation based on NASA's QUDT (Quantity, Unit, Dimension and Type) ontology using Arpeggio as a grammar parser [11].…”
Section: Related Workmentioning
confidence: 99%
“…In a supervised machine learning-based approach, measurement parsing is usually formally defined as a sequence labelling problem encompassing a variety of tasks, e.g. part-of-speech (POS) tagging or named-entity recognition (NER) [13]. Most of the existing tools are trainable, which means that they are able to automatically learn complex features and adapt parsing rules from training data.…”
Section: Related Workmentioning
confidence: 99%
“…These RME methods have their merits and shortcomings, as presented in Tkaczyk, Collins, Sheridan, and Beel's (2018) comprehensive comparison of these methods. Although machine learning‐based approaches require minimal human involvement (aside from annotation on training data) to obtain satisfactory performance, they suffer from data sparseness and a lack of generality.…”
Section: Related Workmentioning
confidence: 99%