BackgroundTaxonomic descriptions are traditionally composed in natural language and published in a format that cannot be directly used by computers. The Exploring Taxon Concepts (ETC) project has been developing a set of web-based software tools that convert morphological descriptions published in telegraphic style to character data that can be reused and repurposed. This paper introduces the first semi-automated pipeline, to our knowledge, that converts morphological descriptions into taxon-character matrices to support systematics and evolutionary biology research. We then demonstrate and evaluate the use of the ETC Input Creation - Text Capture - Matrix Generation pipeline to generate body part measurement matrices from a set of 188 spider morphological descriptions and report the findings.ResultsFrom the given set of spider taxonomic publications, two versions of input (original and normalized) were generated and used by the ETC Text Capture and ETC Matrix Generation tools. The tools produced two corresponding spider body part measurement matrices, and the matrix from the normalized input was found to be much more similar to a gold standard matrix hand-curated by the scientist co-authors. Special conventions utilized in the original descriptions (e.g., the omission of measurement units) were attributed to the lower performance of using the original input. The results show that simple normalization of the description text greatly increased the quality of the machine-generated matrix and reduced edit effort. The machine-generated matrix also helped identify issues in the gold standard matrix.ConclusionsETC Text Capture and ETC Matrix Generation are low-barrier and effective tools for extracting measurement values from spider taxonomic descriptions and are more effective when the descriptions are self-contained. Special conventions that make the description text less self-contained challenge automated extraction of data from biodiversity descriptions and hinder the automated reuse of the published knowledge. The tools will be updated to support new requirements revealed in this case study.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1352-7) contains supplementary material, which is available to authorized users.
Harnessing the NEON data revolution to advance open environmental science with a diverse and data-capable community. Ecosphere 12(12):e03833.
This project interrogates a workshop leader and whole-meeting talk among a group of scientists gathered at a workshop to discuss cyberinfrastructure and the sharing of both ‘light’ and ‘dark’ data in the sciences. This project analyzes discourses working through the workshop talk to interrogate the social relations, interdisciplinary identities, concerns, and commonalities in the sciences and in relation to emerging opportunities for computing and data sharing in the cloud. The findings point to the efficacy of arranging scientists around data collection processes for collaborative work as opposed to groupings around data type, discipline, work sectors, or collection location. This research provides an opportunity to consider the democratization of data, academic boundaries in the sciences, as well as interdisciplinary and collaborative problem-solving processes that happen in groups across academic and applied contexts.
Macrosystem‐scale research is supported by many ecological networks of people, infrastructure, and data. However, no network is sufficient to address all macrosystems ecology research questions, and there is much to be gained by conducting research and sharing resources across multiple networks. Unfortunately, conducting macrosystem research across networks is challenging due to the diversity of expertise and skills required, as well as issues related to data discoverability, veracity, and interoperability. The ecological and environmental science community could substantially benefit from networking existing networks to leverage past research investments and spur new collaborations. Here, we describe the need for a “network of networks” (NoN) approach to macrosystems ecological research and articulate both the challenges and potential benefits associated with such an effort. We describe the challenges brought by rapid increases in the volume, velocity, and variety of “big data” ecology and highlight how a NoN could build on the successes and creativity within component networks, while also recognizing and improving upon past failures. We argue that a NoN approach requires careful planning to ensure that it is accessible and inclusive, incorporates multimodal communications and ways to interact, supports the creation, testing, and promulgation of community standards, and ensures individuals and groups receive appropriate credit for their contributions. Additionally, a NoN must recognize important trade‐offs in network architecture, including how the degree of centralization of people, infrastructure, and data influence network scalability and creativity. If implemented carefully and thoughtfully, a NoN has the potential to substantially advance our understanding of ecological processes, characteristics, and trajectories across broad spatial and temporal scales in an efficient, inclusive, and equitable manner.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.