A Comparative Study of Sequence Tagging Methods for Domain Knowledge Entity Recognition in Biomedical Papers

Wu, Jian; Hoque, Reshad Ul; Reiske, Gunnar W.; Weigle, Michele C.; Bradshaw, Brenda T.; Gaff, Holly; Jiāng, Lì; Kwan, Chiman

doi:10.1145/3383583.3398602

Cited by 3 publications

(1 citation statement)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Named Entity Recognition (NER) aims at recognizing mentions of rigid designators from text belonging to predefined semantic types such as person, location, and organization [6]. In general, the entities appearing in natural language can be beyond the scope of these named entities, such as domain knowledge entities [7], biomedical entities, and materials compositions [8]. A simple rule-based extractor such as a grammar-based noun phrase chunker does not generalize well because the text span of an object name or an aspect can be a subphrase or a superphrase of another phrase.…”

Section: Introductionmentioning

confidence: 99%

Visual descriptor extraction from patent figure captions

Wei

Ajayi

et al. 2022

Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries

Self Cite

View full text Add to dashboard Cite

Technical drawings used for illustrating designs are ubiquitous in patent documents, especially design patents. Different from natural images, these drawings are usually made using black strokes with little color information, making it challenging for models trained on natural images to recognize objects. To facilitate indexing and searching, we propose an effective and efficient visual descriptor model that extracts object names and aspects from patent captions to annotate benchmark patent figure datasets. We compared two state-of-the-art named entity recognition (NER) models and found that with a limited number of annotated samples, the BiLSTM-CRF model outperforms the Transformer model by a significant margin, achieving an overall F1=96.60%. We further conducted a data efficiency study by varying the number of training samples and found that BiLSTM consistently beats the transformer model on our task. The proposed model is used to annotate a benchmark patent figure dataset. CCS CONCEPTS• Computing methodologies → Information extraction.

show abstract

Section: Introductionmentioning

confidence: 99%