The Terra Populus project (TerraPop) addresses a variety of data management, curation, and preservation challenges with respect to spatiotemporal population and environmental data. In this article, we describe our approaches to these challenges, with a particular focus on geospatial data workflows and associated provenance metadata. The goal of TerraPop is to enable research, learning, and policy analysis by providing integrated spatiotemporal data describing people and their environment. To do so, TerraPop is assembling a globe-spanning and temporally extensive collection of high-quality population and environmental data, ensuring good documentation, and developing a Web-based data access system that enables users to assemble customized integrated data sets drawing on a variety of data sources and formats. We describe TerraPop's collection strategies, detail the geospatial workflows involved in preparing data for ingest into the project database and those used to transform data across formats for dissemination, and discuss the system used to capture and manage provenance metadata throughout the project. A key aspect of the project is the development of global current and historical administrative unit boundaries that can be linked to census data. These boundaries serve as the linchpin of TerraPop's data integration strategy, and constitute an important data set in their own right.
Big geospatial data is an emerging sub-area of geographic information science, big data, and cyberinfrastructure. Big geospatial data poses two unique challenges to these and other cognate disciplines. First, raster and vector data structures and analyses have developed on largely separate paths for the last twenty years and this creates an impediment to researchers utilizing big data platforms that do not promote the integration for these classes. Second, big spatial data repositories have yet to be integrated with big data computation platforms in ways that allow researchers to spatio-temporally analyze big geospatial datasets. IPUMS-Terra, a National Science Foundation cyberInfrastructure project, begins to address these challenges. IPUMS-Terra is a spatial data infrastructure project that provides a unified framework for accessing, analyzing, and transforming big heterogeneous spatio-temporal data, and is part of the IPUMS (Integrated Public Use Microdata Series) data infrastructure. It supports big geospatial data analysis and provides integrated big geospatial services to its users. As IPUMS-Terra’s data volume grows, we seek to integrate geospatial platforms that will scale geospatial analyses and address current bottlenecks within our system. However, our work shows that there are still unresolved challenges for big geospatial analysis. The most pertinent is that there is a lack of a unified framework for conducting scalable integrated vector and raster data analysis. We conducted a comparative analysis between PostgreSQL with PostGIS and SciDB and concluded that SciDB is the superior platform for scalable raster zonal analyses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.