2017
DOI: 10.3233/isu-160816
|View full text |Cite
|
Sign up to set email alerts
|

A semi-automatic approach for detecting dataset references in social science texts

Abstract: Abstract. Today, full-texts of scientific articles are often stored in different locations than the used datasets. Dataset registries aim at a closer integration by making datasets citable but authors typically refer to datasets using inconsistent abbreviations and heterogeneous metadata (e.g. title, publication year). It is thus hard to reproduce research results, to access datasets for further analysis, and to determine the impact of a dataset. Manually detecting references to datasets in scientific articles… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(6 citation statements)
references
References 21 publications
0
6
0
Order By: Relevance
“…Subsequently, Duck et al (2013) applied local cues and cross-mentioned cues as features to design a rule-based entity recognition system, BioNerDS, to identify data sets and software names in medical science literature. Furthermore, Ghavimi et al (2016) proposed a semi-automatic approach based on special features extracted from data set titles to find data set references and links. Compared to manual methods, rule-based methods enable semi-automatic identification of data set entities across numerous documents.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Subsequently, Duck et al (2013) applied local cues and cross-mentioned cues as features to design a rule-based entity recognition system, BioNerDS, to identify data sets and software names in medical science literature. Furthermore, Ghavimi et al (2016) proposed a semi-automatic approach based on special features extracted from data set titles to find data set references and links. Compared to manual methods, rule-based methods enable semi-automatic identification of data set entities across numerous documents.…”
Section: Literature Reviewmentioning
confidence: 99%
“…[30]. Research on the dataset name extraction task uses a great variety of methods throughout the NER spectrum, including, but not limited to, the following: rule-based, BiLSTM-CRF and BERT [3,[6][7][8][9][10][11][12][13][14].…”
Section: Related Workmentioning
confidence: 99%
“…The advantage of BiLSTM is that it is better at predicting long sequences, predicting every word individually [41], while CRF predicts based on the joint probability of the whole sentence, making sure that the optimal sequence of tags is achieved [41][42][43]. To date, the BiLSTM-CRF based model produces the best performance on the dataset extraction task, with an F1 score of 0.85 [8].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Mathiak and Boland 12 and Ghavimi et al. 13 explored variations in citation practices for social sciences datasets, and the latter study proposed a linked data approach to solve this issue by developing an ontology.…”
Section: Introductionmentioning
confidence: 99%