The collection, organization, and long-term preservation of resources are the raison d'être of archives and archivists. The archival community, however, has largely neglected science data, assuming they were outside the bounds of their professional concerns. Scientists, on the other hand, increasingly recognize that they lack the skills and expertise needed to meet the demands being placed on them with regard to data curation and are seeking the help of ''data archivists'' and ''data curators.'' This represents a significant opportunity for archivists and archival scholars but one that can only be realized if they better understand the scientific context.
The study found that significant time is required to contact and negotiate with rights holders and that the biggest obstacle to getting permission is non-response. Of those requests that get a response, the vast majority are to grant permission. While few of the requests were met with denial, the data suggest that commercial copyright holders are much more likely to deny permission than other types of copyright holders. The data also show that adherence to the common policy of only displaying online those documents with explicit permission will likely result in substantially incomplete online collections.
SEAD – a project funded by the US National Science Foundation’s DataNet program – has spent the last five years designing, building, and deploying an integrated set of services to better connect scientists’ research workflows to data publication and preservation activities. Throughout the project, SEAD has promoted the concept and practice of “active curation,” which consists of capturing data and metadata early and refining it throughout the data life cycle. In promoting active curation, our team saw an opportunity to develop tools that would help scientists better manage data for their own use, improve team coordination around data, implement practices that would serve the data better over time, and seamlessly connect with data repositories to ease the burden of sharing and publishing.
SEAD has worked with 30 projects, dozens of researchers, and hundreds of thousands of files, providing us with ample opportunities to learn about data and metadata, integrating with researchers’ workflows, and building tools and services for data. In this paper, we discuss the lessons we have learned and suggest how this might guide future data infrastructure development efforts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.