Easy access to multi‐taxa information (e.g. distribution, traits, diet) in the scientific literature is essential to understand, map and predict all‐inclusive biodiversity. Tools are needed to automatically extract useful information from the ever‐growing corpus of ecological texts and feed this information to open data repositories. A prerequisite is the ability to recognise mentions of taxa in text, a special case of named entity recognition (NER). In recent years, deep learning‐based NER systems have become ubiquitous, yielding state‐of‐the‐art results in the general and biomedical domains. However, no such tool is available to ecologists wishing to extract information from the biodiversity literature. We propose a new tool called TaxoNERD that provides deep neural network (DNN) models to recognise taxon mentions in ecological documents. To achieve high performance, these models usually need to be trained on a large corpus of manually annotated text. Creating such a corpus is a laborious and costly process, with the result that manually annotated corpora in the ecological domain tend to be too small to learn an accurate DNN model from scratch. To address this issue, we leverage existing models pretrained on large biomedical corpora using transfer learning. The performance of our models is evaluated on four corpora and compared to the most popular taxonomic entity recognition tools. Our experiments suggest that existing taxonomic NER tools are not suited to the extraction of ecological information from text as they performed poorly on ecologically oriented corpora, either because they do not take account of the variability of taxon naming practices or because they do not generalise well to the ecological domain. Conversely, a domain‐specific DNN‐based tool like TaxoNERD outperformed the other approaches on an ecological information extraction task. Efforts are needed to raise ecological information extraction to the same level of performance as its biomedical counterpart. One promising direction is to leverage the huge corpus of unlabelled ecological texts to learn a language representation model that could benefit downstream tasks. These efforts could be highly beneficial to ecologists on the long term.
Although soil ecology has benefited from recent advances in describing soil organism trophic traits, large scale reconstruction of soil food webs is still impeded by (1) the dissemination of most data about trophic interactions and diets into distributed, heterogeneous repositories, (2) no well-established terminology for describing feeding preferences, processes, and resource types, and (3) much heterogeneity in the classification of different soil groups, or absence of such classifications. Soil trophic ecology could therefore benefit from standardisation efforts. Here, we propose the Soil Food Web Ontology as a new formal framework for representing knowledge on trophic ecology of soil organisms. This ontology captures the semantics of trophic concepts, including consumer-resource interactions, feeding preferences and processes, and provides a formalisation of trophic group definitions. The ontology can be used to add semantic annotations to trophic data, thus facilitating the integration of heterogeneous datasets. It also provides lexical resources that can support the development of information extraction tools to facilitate literature-based datasets creation. Finally, it enables automatic and consistent classification of soil organisms based on their trophic relationships. We argue that, by harmonising the terminology and underlying concepts of soil trophic ecology, our ontology allows for better use of available information on the feeding habits of soil organisms and sounder classifications, thus facilitating the reconstruction of soil food webs and making food web research more accessible, reusable and reproducible.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.