The process of developing a digital collection in the context of a research project often involves a pipeline pattern during which data growth, data types, and data authenticity need to be assessed iteratively in relation to the different research steps and in the interest of archiving. Throughout a project’s lifecycle curators organize newly generated data while cleaning and integrating legacy data when it exists, and deciding what data will be preserved for the long term. Although these actions should be part of a well-oiled data management workflow, there are practical challenges in doing so if the collection is very large and heterogeneous, or is accessed by several researchers contemporaneously. There is a need for data management solutions that can help curators with efficient and on-demand analyses of their collection so that they remain well-informed about its evolving characteristics. In this paper, we describe our efforts towards developing a workflow to leverage open science High Performance Computing (HPC) resources for routinely and efficiently conducting data management tasks on large collections. We demonstrate that HPC resources and techniques can significantly reduce the time for accomplishing critical data management tasks, and enable a dynamic archiving throughout the research process. We use a large archaeological data collection with a long and complex formation history as our test case. We share our experiences in adopting open science HPC resources for large-scale data management, which entails understanding usage of the open source HPC environment and training users. These experiences can be generalized to meet the needs of other data curators working with large collections.
The choru of Chersonesos, located in southeast Crimea, Ukraine, is a uniquely well-preserved ancient agricultural territory in danger of destruction by urban encroachment and coastal erosion. This study seeks to investigate the use of remotely sensed data for the mapping of archaeological features in the territory and as a means to monitor urban encroachment and coastal erosion that threaten this historic monument. Historic aerial photography and Corona photography are being analyzed to map archeological features. A digital elevation model (DEM) created via repeat pass interferometry from European Remote Sensing (ERS) is being used in conjunction with the multispectral imagery and geophysical and geomorphological in situ data to study the ancient settlement of the area and identify and monitor vulnerable parts of the chorea. Preliminary results from classification of Landsat multispectral data and comparisons of Corona photography and Indian Remote Sensing (IRS) panchromatic data indicate a massive increase in the urban land cover in the study area during the last few decades. A Geographical Information System (GIs), for which the remotely sensed data provides the spatial framework for integrating site specific and regional information, is being developed for the site as an aid in land use planning, site protection, and development of an archaeological park.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.