2018
DOI: 10.1186/s13321-018-0280-0
|View full text |Cite
|
Sign up to set email alerts
|

Putting hands to rest: efficient deep CNN-RNN architecture for chemical named entity recognition with no hand-crafted rules

Abstract: Chemical named entity recognition (NER) is an active field of research in biomedical natural language processing. To facilitate the development of new and superior chemical NER systems, BioCreative released the CHEMDNER corpus, an extensive dataset of diverse manually annotated chemical entities. Most of the systems trained on the corpus rely on complicated hand-crafted rules or curated databases for data preprocessing, feature extraction and output post-processing, though modern machine learning algorithms, s… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
32
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 46 publications
(34 citation statements)
references
References 23 publications
2
32
0
Order By: Relevance
“…The embedding model with best results was Bio NPLAB that contains embeddings extracted from biomedical corpus and contains around five million embeddings. The fact that tokenization is the biggest issue in this model confirms the concerns of [48] in regard to the creation of proper tokenizers to NER tasks in biomedical corpus. In this case, would be interesting to see in the future if another tokenizer could achieve better results or if it was necessary to introduce hand-crafted rules in this model to achieve the same performance of the character model.…”
Section: Bio Nplab Glovesupporting
confidence: 67%
See 1 more Smart Citation
“…The embedding model with best results was Bio NPLAB that contains embeddings extracted from biomedical corpus and contains around five million embeddings. The fact that tokenization is the biggest issue in this model confirms the concerns of [48] in regard to the creation of proper tokenizers to NER tasks in biomedical corpus. In this case, would be interesting to see in the future if another tokenizer could achieve better results or if it was necessary to introduce hand-crafted rules in this model to achieve the same performance of the character model.…”
Section: Bio Nplab Glovesupporting
confidence: 67%
“…This results lead us to believe that BI-LSTM-CRF is the best architecture when dealing with RNNs. Putting hands to rest: efficient deep CNN-RNN architecture for chemical named entity recognition with no hand-crafted rules This article [48] is about a work closer to ours. The authors know that most systems that perform NER tasks rely on hand-crafted rules or curated databases for data preprocessing, feature extraction and output postprocessing even if modern machine learning algorithms, such as deep neural networks, can automatically design the rules with little to none human intervention.…”
Section: Bidirectional Lstm-crf Models For Sequence Taggingmentioning
confidence: 99%
“…For instance, many named entity recognition methods have been applied to the detection of chemical entities (compound names and formulas) in text (see, for instance, refs. [11][12][13][14][15] , as well as ref. 9 for an extensive review).…”
mentioning
confidence: 83%
“…Zhao et al [25] proposed a multiple label strategy (MLS) that can replace the CRF layer of a deep neural network for detecting spans of disease names. Korvigo et al [26] applied a CNN-RNN network to recognize spans of chemicals and Luo et al 2018 [28] proposed attention-based bidirectional LSTM with CRF to detect spans of chemicals. Unanue et al, 2017 [29] used bidirectional LSTM with CRF to detect spans of drug names and clinical concepts, while Lyu et al 2017 [27] proposed bidirectional LSTM-RNN model for detecting spans of a variety of biomedical concepts.…”
Section: Related Workmentioning
confidence: 99%