2019
DOI: 10.1177/1460458219869490
|View full text |Cite
|
Sign up to set email alerts
|

Recognizing software names in biomedical literature using machine learning

Abstract: Software tools now are essential to research and applications in the biomedical domain. However, existing software repositories are mainly built using manual curation, which is time-consuming and unscalable. This study took the initiative to manually annotate software names in 1,120 MEDLINE abstracts and titles and used this corpus to develop and evaluate machine learning–based named entity recognition systems for biomedical software. Specifically, two strategies were proposed for feature engineering: (1) doma… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 16 publications
0
4
0
Order By: Relevance
“…Increasingly publishers are providing XML versions of article text, and XML is much more suitable for machine processing, but coverage remains only partial and uneven across domains, especially in combination with open access licensing, requiring contract negotiations. Thus, existing NER in scholarly text are often limited to article abstracts or sections (Ozyurt et al, 2016; Schindler et al, 2020; Wei et al, 2020), rather than enabling full‐scale “distant reading” for bibliometric and text analysis of the literature (Mehta, Bradley, Hancock, & Collins, 2017). While XML is increasingly available from traditional publishers, we think it is unlikely that standardized XML will ever achieve full coverage, especially considering the growth of open access pre‐print and institutional repositories.…”
Section: Related Workmentioning
confidence: 99%
“…Increasingly publishers are providing XML versions of article text, and XML is much more suitable for machine processing, but coverage remains only partial and uneven across domains, especially in combination with open access licensing, requiring contract negotiations. Thus, existing NER in scholarly text are often limited to article abstracts or sections (Ozyurt et al, 2016; Schindler et al, 2020; Wei et al, 2020), rather than enabling full‐scale “distant reading” for bibliometric and text analysis of the literature (Mehta, Bradley, Hancock, & Collins, 2017). While XML is increasingly available from traditional publishers, we think it is unlikely that standardized XML will ever achieve full coverage, especially considering the growth of open access pre‐print and institutional repositories.…”
Section: Related Workmentioning
confidence: 99%
“…These features can be used together to improve one extraction model. For example, Wei et al (2020) used the bag-of-word, word shape, morphological information, POS tag, domain knowledge, and word embedding features to optimize the performance of the CRF model in software entity extraction. Besides optimizing the single model, combining different features and models is another method used to acquire better extraction models (Liakata et al, 2012;Liakata & Soldatova, 2008).…”
Section: Statistical Machine Learning-based Extraction Methodsmentioning
confidence: 99%
“…The algorithm was applied to the keyword autocompletion effect to achieve good results. (9) Duanmu and Xing used a remotely supervised method to complete the acquisition and preprocessing of data to reduce errors caused by manual operations. Also, a semi-automatic method of collecting financial KG data was proposed.…”
Section: Literature Reviewmentioning
confidence: 99%