Proceedings of the 18th BioNLP Workshop and Shared Task 2019
DOI: 10.18653/v1/w19-5035
|View full text |Cite
|
Sign up to set email alerts
|

Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings

Abstract: Chemical patents are an important resource for chemical information. However, few chemical Named Entity Recognition (NER) systems have been evaluated on patent documents, due in part to their structural and linguistic complexity. In this paper, we explore the NER performance of a BiLSTM-CRF model utilising pre-trained word embeddings, characterlevel word representations and contextualized ELMo word representations for chemical patents. We compare word embeddings pre-trained on biomedical and chemical patent co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
38
0
2

Year Published

2020
2020
2021
2021

Publication Types

Select...
3
2
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 37 publications
(40 citation statements)
references
References 35 publications
0
38
0
2
Order By: Relevance
“…PubMed+PMC. Pre-trained word embeddings from PubMed and PMC with 200 dimensions have been used widely for NER tasks in the biological and biomedical domain [23]. ese embeddings are generated using the Word2vec model [14] in the word2vec binary format from a collection of PubMed articles.…”
Section: Encoding Formatsmentioning
confidence: 99%
“…PubMed+PMC. Pre-trained word embeddings from PubMed and PMC with 200 dimensions have been used widely for NER tasks in the biological and biomedical domain [23]. ese embeddings are generated using the Word2vec model [14] in the word2vec binary format from a collection of PubMed articles.…”
Section: Encoding Formatsmentioning
confidence: 99%
“…[57,63,64] Even now,a ne normous amount of untapped information remains housed in laboratory notebooks and journal articles.F or such information to be directly usable,s omeone must undertake the challenge of compiling the data into an accessible,u serfriendly format and overcome any intellectual property restrictions.I mage and natural language processing techniques can make this task less burdensome;t hus there is increasing interest in applying such techniques to the chemical sciences. [65][66][67][68][69][70] Autonomous discovery systems rely on av ariety of computational tools to generate hypotheses from data without human intervention. This includes both the software that makes the recommendations (e.g., proposes correlations, regresses models,s elects experiments) as well as the underlying hardware that makes using the software tractable.O ur discussion of the advances in this area focuses on software developments with an emphasis on machine learning algorithms that have elicited cross-disciplinary excitement.…”
Section: Enabling Factorsmentioning
confidence: 99%
“…We have begun preparing the corpus and will make available strong baselines for the tasks. Initial publications related to the data and Task 1 appear at the 2019 ALTA and BioNLP workshops, respectively [18,19].…”
Section: Data and Evaluationmentioning
confidence: 99%
“…To support teams who are interested in Task 2 only, a pre-trained chemical NER tagger is provided as a resource [19].…”
Section: Data and Evaluationmentioning
confidence: 99%