2021
DOI: 10.1108/el-10-2020-0301
|View full text |Cite
|
Sign up to set email alerts
|

Data set entity recognition based on distant supervision

Abstract: Purpose This paper aims to identify data set entities in scientific literature. To address poor recognition caused by a lack of training corpora in existing studies, a distant supervised learning-based approach is proposed to identify data set entities automatically from large-scale scientific literature in an open domain. Design/methodology/approach Firstly, the authors use a dictionary combined with a bootstrapping strategy to create a labelled corpus to apply supervised learning. Secondly, a bidirectional… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 29 publications
0
7
0
Order By: Relevance
“…Entity Replacement (ER) [7] A variant of MR, ER replaces entities with alternative entities from sources other than the original training set.…”
Section: Text Augmentation Methodsmentioning
confidence: 99%
See 4 more Smart Citations
“…Entity Replacement (ER) [7] A variant of MR, ER replaces entities with alternative entities from sources other than the original training set.…”
Section: Text Augmentation Methodsmentioning
confidence: 99%
“…To both adhere to established research and examine the cost-effectiveness of data augmentation across different model types, we opted to utilize both architectures. For data augmentation, we focused on methods that are well regarded in the literature and have demonstrated improvements in the performance of the NER model [4,5,7,27], while also being relatively simple to implement. Hence, we chose Mention Replacement (MR) for augmenting tokens tagged with specific entity types, excluding those labeled "O," and Contextual Word Replacement (CWR) for tokens specifically tagged as "O."…”
Section: Text Augmentation Methodsmentioning
confidence: 99%
See 3 more Smart Citations