Sites in the Long Term Ecological Research (LTER) Network have now contributed greater than 5,000 data packages into the LTER Network Information System (NIS). This corpus of data and metadata allows us to analyze characteristics of data from the LTER program, including temporal coverage, data format, rate of submission, volume of data, and ecological characteristics of the data (e.g., ecosystems, processes, organisms). In addition, data/metadata congruence checks included in the Provenance Aware Synthesis Tracking Architecture (PASTA) underlying the NIS allow us to examine the quality of metadata submitted. Initial records of data use and citation provide the means to evaluate the efficacy of this repository in disseminating data throughout a broader community-89 citations of data packages found in 52 articles have been documented to date.
We describe the process by which the Long-Term Ecological Research (LTER) Network standardised their metadata through the adoption of the Ecological Metadata Language (EML). We describe the strategies developed to improve motivation and to complement the information teclmology resources available at the LTER sites. EML implementation is presented as a mapping process that was accomplished per site in stages, with metadata quality ranging from 'discovery level' to rich-content level over time. As of publication, over 6000 rich-content standardised records have been published using EML, potentially enabling the goal of machine-mediated, metadata-driven data synthesis. Brunt, J. (2009) 'The Long-Term Ecological Research community metadata standardisation project: a progress report', Int. J. Metadata Semantics and Ontologies, Vol. Biographical notes: I. San Gil received his PhD in Mechanical Engineering from Yale University in 2001. He is currently the metadata project coordinator and senior systems analyst for the National Biological Information Infrastructure and Long-Term Ecological Network. His current research interests include metadata management systems, bioinformatics, and metadata-driven systems. Karen Baker holds an MS from the University of California at Los Angeles, and she is currently the information manager at Scripps Institution of Oceanography for ClCOFI, Palmer LTER and California Current Ecosystem LTERs. John Campbell holds a PhD from the State University of New York and he is currently the information manager at Hubbard Brook LTER. Ellen G. Denny holds a MFS from the Yale School of Forestry and Environmental Studies, and is part of the information management team for the Hubbard Brook LTER site. Kristin Vanderbilt received her PhD (Biology) at the U. of New Mexico, where she is currently an Associate Research Professor and the Sevilleta LTER Information Manager. Brian Riordan received an MS from the University of Alaska -Fairbanks, he currently works at the private sector on GIS. Rebecca Koskela is a Bioinformatics Specialist at the Arctic Region Supercomputing Center at the University of Alaska Fairbanks campus, Rebecca was a member of the senior management team at the Aventis Cambridge Genome Center. Jason Downing is the current Information Manager at the Bonanza Creek LTER. Sabine Grabner received her MS (Meteorology) from the James Brunt is an Associate Director for Information Management of the LTER Network Office, he leads and supervises a staff of six who provide operations and maintenance of LTER cyberinfrastructure, design and develop the L TER Network Information System, and provide stewardship of LTER Network databases and websites. He pursued a unique MS mixing Ecology, Computer Science, and Experimental Statistics at NMSU.
The Environmental Data Initiative (EDI) is a continuation and expansion of the original United Stated Long-Term Ecological Research Program (US-LTER) data repository which went into production in 2013. Building on decades of data management experience in LTER, EDI is addressing the challenge of publishing a diverse corpus of research data (Servilla et al. 2016). EDI’s accomplishments span all aspects of the data curation and publication lifecycle, including repository cyberinfrastructure, outreach and training, and enhancements to data documentation methodologies used by the environmental and ecological research communities. EDI is managing almost 43,000 unique data packages and their revisions from a community of nearly 2,300 individual data authors, most of which are contributed by LTER sites, and are openly accessible and documented with rich science metadata in the Ecological Metadata Language (EML) standard. Here we will present how EDI achieves FAIR data principles (Wilkinson et al. 2016, Stall et al. 2017), and report data use metrics as a measure of success. The FAIR principles serve as benchmarks for EDI’s operation and management: the data we curate are Findable because they reside in an open repository, with unique and persistent digital object identifiers (DOIs) and standard metadata indexed as a searchable resource; they are Accessible through industry standard protocols and are, in most cases, under an open-access license (access control is available if required); Interoperability is achieved by archiving data in commonly used file formats, and both metadata and data are machine readable and accessible; rich, high quality science metadata, with automated congruence and completeness checking, render data fit for Reuse in multiple contexts and environments, along with easily generated data provenance to document their lineage. The success of this approach is proven by the number and spatial and temporal extent of recent re-analyses and synthesis efforts of these data. Although formal data citations are not yet common practice, a Google Scholar search reveals over 400 journal articles crediting data re-use through an EDI DOI. However, despite improved data availability, researchers still report that the largest time investment in synthesis projects is discovering, cleaning and combining primary datasets until all data are completely understood and converted to a similar format. Starting with long-term biodiversity observation data EDI is addressing this issue by implementing a pre-harmonization of thematically similar data sets. Positioned between the data author’s specific data format and larger biodiversity data stores or synthesis projects, this approach allows uniform access without the loss of ancillary information. This pre-harmonization step may be accomplished by data managers because the dataset still contains all original information without any aggregation or science question specific decisions for data omission or cleaning. The data are still distributed into distinct datasets allowing for asynchronous updating of long-term observations. The addition of specific and standardized metadata makes them easily discoverable.
Considerable data analyses use automated workflows to ingest data from public repositories, and rely on data packages of high structural quality. The Long Term Ecological Research (LTER) Network now screens all packages entering its long-term archive to ensure completeness and quality, and to ascertain that metadata and data are structurally congruent, i.e., that the data typing and formats expressed in metadata agree with that found in data entities. The EML Congruence Checker (ECC) system is a component of the LTER Provenance Aware Synthesis Tracking Architecture (PASTA), and operates on data tables in packages described with Ecological Metadata Language using the EML Data Manager Library, written in Java. Checking is extensible for other data types and customizable via a template. Reports are retained as part of the submitted data package, and summaries here reflect the general usability of LTER data for a variety of purposes. On average in 2015, site-contributed data in the LTER catalog were 95 % compliant (valid) with the current suite of checks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.