Evaluating Named-Entity Recognition approaches in plant molecular biology

Do, Huy; Than, Khoat; Larmande, Pierre

doi:10.1101/360966

Cited by 9 publications

(8 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Conditional Random Fields (CFR) [33] are one of the most widely used generative classifiers intended to address NER tasks [61,62,63] as long as their focus is on sequential data. To predict named entity tags, a word-level examination is conducted with a set of sorted and sequential words mapped with an internal state of transitions produced by their corresponding entity tags.…”

Section: Proposed Methodologymentioning

confidence: 99%

Using Twitter Data to Monitor Natural Disaster Social Dynamics: A Recurrent Neural Network Approach with Word Embeddings and Kernel Density Estimation

Hernandez-Suarez

Sánchez-Pérez

Toscano-Medina

et al. 2019

Sensors

View full text Add to dashboard Cite

In recent years, Online Social Networks (OSNs) have received a great deal of attention for their potential use in the spatial and temporal modeling of events owing to the information that can be extracted from these platforms. Within this context, one of the most latent applications is the monitoring of natural disasters. Vital information posted by OSN users can contribute to relief efforts during and after a catastrophe. Although it is possible to retrieve data from OSNs using embedded geographic information provided by GPS systems, this feature is disabled by default in most cases. An alternative solution is to geoparse specific locations using language models based on Named Entity Recognition (NER) techniques. In this work, a sensor that uses Twitter is proposed to monitor natural disasters. The approach is intended to sense data by detecting toponyms (named places written within the text) in tweets with event-related information, e.g., a collapsed building on a specific avenue or the location at which a person was last seen. The proposed approach is carried out by transforming tokenized tweets into word embeddings: a rich linguistic and contextual vector representation of textual corpora. Pre-labeled word embeddings are employed to train a Recurrent Neural Network variant, known as a Bidirectional Long Short-Term Memory (biLSTM) network, that is capable of dealing with sequential data by analyzing information in both directions of a word (past and future entries). Moreover, a Conditional Random Field (CRF) output layer, which aims to maximize the transition from one NER tag to another, is used to increase the classification accuracy. The resulting labeled words are joined to coherently form a toponym, which is geocoded and scored by a Kernel Density Estimation function. At the end of the process, the scored data are presented graphically to depict areas in which the majority of tweets reporting topics related to a natural disaster are concentrated. A case study on Mexico’s 2017 Earthquake is presented, and the data extracted during and after the event are reported.

show abstract

Section: Proposed Methodologymentioning

confidence: 99%

Using Twitter Data to Monitor Natural Disaster Social Dynamics: A Recurrent Neural Network Approach with Word Embeddings and Kernel Density Estimation

Hernandez-Suarez

Sánchez-Pérez

Toscano-Medina

et al. 2019

Sensors

View full text Add to dashboard Cite

show abstract

“…The proposed model uses a hybrid neural network technique BiLSTM-CRF (bidirectional long short term memory -Conditional Random Field) and provides better performance than baseline models. Similarly, Do et al [18] has also proposed a hybrid NER technique for the identification of named entities in plant molecular biology dataset. The proposed technique identifies genes, proteins and phenotypic Traits as named-entities.…”

Section: Named Entity Recognition Techniquesmentioning

confidence: 99%

“…However, Etaiwi et al [23] proposed a model for the identifi cation of Arabic Names and states that the CRF is very effi cient in identifying NER problem. However, the recent studies [16,18,32] concluded that the use of CRF alone will not give a better performance. The amalgamation of CRF and neural networks will provide effi cient performance.…”

Section: Maim Training and Aspect Identifi Cation Phasementioning

confidence: 99%

“…However, according to a comparative study[57], CNNs is not suitable for NER problem, whereas, it reassures that the invariants of RNNs (Recurrent Neural Networks) are suitable for NER problem. Though NER problem has been extensively studied in information retrieval field[16,18,32] and it is in progress as the topic of research. Yet, there is a need to include these techniques into sentiment analysis domain.…”

mentioning

confidence: 99%

See 1 more Smart Citation

Movie Aspects Identification Model for Aspect Based Sentiment Analysis

Mir

Mahmood

2020

ITC

View full text Add to dashboard Cite

Aspect Based Sentiment Analysis techniques have been applied in several application domains. From the last two decades, these techniques have been developed mostly for product and service application domains. However, very few aspect-based sentiment techniques have been proposed for the movie application domain. Moreover, these techniques only mine specific aspects (Script, Director, and Actor) of a movie application domain, nevertheless, the movie application domain is more complex than the product and service application domain. Since, it contains NER (Named Entity Recognition) problem and it cannot be ignored, since there is an opinion often associated with it. Consequently, in this paper MAIM (Movie Aspect Identification Model) is proposed that can extract not only movie specific aspects, also identifies NEs (Named Entities) such as Person Name and Movie Title. The three main contributions are 1) the identification of infrequent aspects, 2) the identification of NE (named entity) in movie application domain, 3) identifying N-gram opinion words as an entity. MAIM incorporates the BiLSTM-CRF hybrid technique and is implemented on the movie application domain having precision 89.9%, recall 88.9% and f1-measure 89.4%. The experimental results show that MAIM performs better than baseline models CRF and LSTM-CRF.

show abstract

“…By sharing this dataset on the PubAnnotation platform and be available at the BioNLP Open Shared Tasks (BioNLP-OST, https://2019.bionlp-ost.org), we invited participants to implement their own methods to solve NER tasks for this dataset. Furthermore, to evaluate the performances, we compared their approaches, implemented during the task, with our method [6], implemented before the hackathon.…”

Section: Objectivementioning

confidence: 99%

OryzaGP: rice gene and protein dataset for named-entity recognition

Larmande

Wang

2019

Genomics Inform

Self Cite

View full text Add to dashboard Cite

2019, Korea Genome Organization This is an open-access article distributed under the terms of the Creative Commons Attribution license (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.

show abstract

Evaluating Named-Entity Recognition approaches in plant molecular biology

Cited by 9 publications

References 10 publications

Using Twitter Data to Monitor Natural Disaster Social Dynamics: A Recurrent Neural Network Approach with Word Embeddings and Kernel Density Estimation

Using Twitter Data to Monitor Natural Disaster Social Dynamics: A Recurrent Neural Network Approach with Word Embeddings and Kernel Density Estimation

Movie Aspects Identification Model for Aspect Based Sentiment Analysis

OryzaGP: rice gene and protein dataset for named-entity recognition

Contact Info

Product

Resources

About