Text Annotation is the process of adding metadata in the text and used in various tasks like natural language processing (NLP) and machine learning models. Named entity recognition (NER) is one of the interesting and challenging tasks of NLP and is being used extensively in many domains. The application of NER will also be useful in handling documents, queries, reports and research articles related to agriculture in identifying pests affecting crops. SpaCy, a free and open source library is being used for NER that requires the text data in a complex annotated format. The process of manual annotation is difficult and time-consuming task. Therefore, to streamline the process of text annotation, we developed an algorithm and a tool for automatic annotation of text data. Approximately 3.6 million queries were collected from “Kisan Call Centre”, a helpline service to farmers by Government of India and plant protection queries of Paddy and Wheat crops were extracted from this database. These queries were annotated with the help of developed tool and annotated corpus was created. The annotated corpus is used to develop NER models and trained for crops and associated pests identification in agriculture domain. Further, the performance of the model is enhanced by reducing features using plural to singular conversion and synonym substitution. The model achieved an F1-score of 97.20%, demonstrating a significant improvement of 3.01% compared to the performance with original queries.
Bibliographic data contains necessary information about literature to help users to recognize and retrieve that resource. These data are used quantitatively by a “Bibliometrician” for analysis and dissemination purpose but with the increasing
rate of literature publication in open access journals such as Nucleic Acids Research (NAR), Springer, Oxford Journals etc., it has become difficult to retrieve structured bibliographic information in desired format. A digital bibliographic database
contains necessary and structured information about published literature. Bibliographic records of different articles are scattered and resides on different web pages. This thesis presents the retrieval system for bibliographic data of NAR at a
single place. For this purpose, parser agents have been developed which access the web pages of NAR and parse the scattered bibliographic data and finally store it into a local bibliographic database. Based on the bibliographic database, “three-tier
architecture” has been utilized to display the bibliographic information in systematized format. Using this system, it would be possible to build the network between different authors and affiliations and also other analytical reports can be
generated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.