Putting hands to rest: efficient deep CNN-RNN architecture for chemical named entity recognition with no hand-crafted rules

Korvigo, Ilia; Holmatov, Maxim; Zaikovskii, Anatolii; Skoblov, Mikhail

doi:10.1186/s13321-018-0280-0

Cited by 46 publications

(34 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The embedding model with best results was Bio NPLAB that contains embeddings extracted from biomedical corpus and contains around five million embeddings. The fact that tokenization is the biggest issue in this model confirms the concerns of [48] in regard to the creation of proper tokenizers to NER tasks in biomedical corpus. In this case, would be interesting to see in the future if another tokenizer could achieve better results or if it was necessary to introduce hand-crafted rules in this model to achieve the same performance of the character model.…”

Section: Bio Nplab Glovesupporting

confidence: 67%

“…This results lead us to believe that BI-LSTM-CRF is the best architecture when dealing with RNNs. Putting hands to rest: efficient deep CNN-RNN architecture for chemical named entity recognition with no hand-crafted rules This article [48] is about a work closer to ours. The authors know that most systems that perform NER tasks rely on hand-crafted rules or curated databases for data preprocessing, feature extraction and output postprocessing even if modern machine learning algorithms, such as deep neural networks, can automatically design the rules with little to none human intervention.…”

Section: Bidirectional Lstm-crf Models For Sequence Taggingmentioning

confidence: 99%

See 1 more Smart Citation

Recognition of genetic mutations in text using deep learning

Matos

2018

Proceedings of the First International Conference on Data Science, E-Learning and Information Systems

View full text Add to dashboard Cite

Deep learning is a sub-area of automatic learning that attempts to model complex structures in the data through the application of different neural network architectures with multiple layers of processing. These methods have been successfully applied in areas ranging from image recognition and classification, natural language processing, and bioinformatics. In this work we intend to create methods for named-entity recognition (NER) in text using techniques of deep learning in order to identify genetic mutations.

show abstract

Section: Bio Nplab Glovesupporting

confidence: 67%

Section: Bidirectional Lstm-crf Models For Sequence Taggingmentioning

confidence: 99%

Recognition of genetic mutations in text using deep learning

Matos

2018

Proceedings of the First International Conference on Data Science, E-Learning and Information Systems

View full text Add to dashboard Cite

show abstract

“…For instance, many named entity recognition methods have been applied to the detection of chemical entities (compound names and formulas) in text (see, for instance, refs. [11][12][13][14][15] , as well as ref. 9 for an extensive review).…”

mentioning

confidence: 83%

Automated extraction of chemical synthesis actions from experimental procedures

et al. 2020

View full text Add to dashboard Cite

Experimental procedures for chemical synthesis are commonly reported in prose in patents or in the scientific literature. The extraction of the details necessary to reproduce and validate a synthesis in a chemical laboratory is often a tedious task requiring extensive human intervention. We present a method to convert unstructured experimental procedures written in English to structured synthetic steps (action sequences) reflecting all the operations needed to successfully conduct the corresponding chemical reactions. To achieve this, we design a set of synthesis actions with predefined properties and a deep-learning sequence to sequence model based on the transformer architecture to convert experimental procedures to action sequences. The model is pretrained on vast amounts of data generated automatically with a custom rule-based natural language processing approach and refined on manually annotated samples. Predictions on our test set result in a perfect (100%) match of the action sequence for 60.8% of sentences, a 90% match for 71.3% of sentences, and a 75% match for 82.4% of sentences.

show abstract

“…Zhao et al [25] proposed a multiple label strategy (MLS) that can replace the CRF layer of a deep neural network for detecting spans of disease names. Korvigo et al [26] applied a CNN-RNN network to recognize spans of chemicals and Luo et al 2018 [28] proposed attention-based bidirectional LSTM with CRF to detect spans of chemicals. Unanue et al, 2017 [29] used bidirectional LSTM with CRF to detect spans of drug names and clinical concepts, while Lyu et al 2017 [27] proposed bidirectional LSTM-RNN model for detecting spans of a variety of biomedical concepts.…”

Section: Related Workmentioning

confidence: 99%

Biomedical Concept Recognition Using Deep Neural Sequence Models

Bada

et al. 2019

Preprint

View full text Add to dashboard Cite

Background: the automated identification of mentions of ontological concepts in natural language texts is a central task in biomedical information extraction. Despite more than a decade of effort, performance in this task remains below the level necessary for many applications.Results: recently, applications of deep learning in natural language processing have demonstrated striking improvements over previously state-of-the-art performance in many related natural language processing tasks. Here we demonstrate similarly striking performance improvements in recognizing biomedical ontology concepts in full text journal articles using deep learning techniques originally developed for machine translation. For example, our best performing system improves the performance of the previous state-of-the-art in recognizing terms in the Gene Ontology Biological Process hierarchy, from a previous best F1 score of 0.40 to an F1 of 0.70, nearly halving the error rate. Nearly all other ontologies show similar performance improvements. Conclusions:A two-stage concept recognition system, which is a conditional random field model for span detection followed by a deep neural sequence model for normalization, improves the state-of-the-art performance for biomedical concept recognition. Treating the biomedical concept normalization task as a sequence-tosequence mapping task similar to neural machine translation improves performance.

show abstract

Putting hands to rest: efficient deep CNN-RNN architecture for chemical named entity recognition with no hand-crafted rules

Cited by 46 publications

References 23 publications

Recognition of genetic mutations in text using deep learning

Recognition of genetic mutations in text using deep learning

Automated extraction of chemical synthesis actions from experimental procedures

Biomedical Concept Recognition Using Deep Neural Sequence Models

Contact Info

Product

Resources

About