File Training Generator For Indonesian Language In Named Entity Recognition Using Anago Library

Fadil, Irfan; Yuniarto, Dwi; Firmansyah, Esa; Herdiana, Dody; Supriadi, Fidi; Rahman, Ali

doi:10.4108/eai.11-7-2019.2297618

Cited by 2 publications

(2 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Developed and optimized by Nakayama in 2017 with the combined technique BiLSTM-CRF [26], anaGo was implemented in Keras for NER and many other sequence labeling tasks. anaGo implements different pre-trained word embeddings as input; it also has the capability to self-generate word embedding based on training data [12,27]. The BiLSTM-CRF architecture is described in Fig.…”

Section: Modelsmentioning

confidence: 99%

See 1 more Smart Citation

Extract antibody and antigen names from biomedical literature

et al. 2022

View full text Add to dashboard Cite

Background The roles of antibody and antigen are indispensable in targeted diagnosis, therapy, and biomedical discovery. On top of that, massive numbers of new scientific articles about antibodies and/or antigens are published each year, which is a precious knowledge resource but has yet been exploited to its full potential. We, therefore, aim to develop a biomedical natural language processing tool that can automatically identify antibody and antigen entities from articles. Results We first annotated an antibody-antigen corpus including 3210 relevant PubMed abstracts using a semi-automatic approach. The Inter-Annotator Agreement score of 3 annotators ranges from 91.46 to 94.31%, indicating that the annotations are consistent and the corpus is reliable. We then used the corpus to develop and optimize BiLSTM-CRF-based and BioBERT-based models. The models achieved overall F1 scores of 62.49% and 81.44%, respectively, which showed potential for newly studied entities. The two models served as foundation for development of a named entity recognition (NER) tool that automatically recognizes antibody and antigen names from biomedical literature. Conclusions Our antibody-antigen NER models enable users to automatically extract antibody and antigen names from scientific articles without manually scanning through vast amounts of data and information in the literature. The output of NER can be used to automatically populate antibody-antigen databases, support antibody validation, and facilitate researchers with the most appropriate antibodies of interest. The packaged NER model is available at https://github.com/TrangDinh44/ABAG_BioBERT.git.

show abstract

Section: Modelsmentioning

confidence: 99%

“…Bi-LSTM uses two LSTM networks, forward (f1-4) and backward (b1-4). The vector representations from both networks are concatenated (c1-4) and inputted to the CRF tagging layer for label assignment [12,13,27]. The model consists of 10 layers with over 2 M parameters.…”

Section: Modelsmentioning

confidence: 99%