We analyze the people and infrastructure involved in the building, sustaining, and curation of large astronomy sky surveys. Our research assesses what new infrastructures, divisions of labor, knowledge, and expertise are necessary for the proper care of data. Between May 2011-February 2012, we conducted fourteen interviews employing Sloan Digital Sky Survey (SDSS) data use as the focus. SDSS is a multi-faceted, multi-phased data-driven telescope project with hundreds of collaborators and thousands of users of the open data. The Follow the Data interview protocol identifies a single publication authored by each interviewee and uses it as a lens looking backward and forward to identify data uses leading into and out of the publication.The interviews revealed the ways these astronomers discover, locate, retrieve, and store external data for their research. Any given astronomy research project may employ multiple methods to discover, locate, retrieve, and store multiple datasets. Our research finds that informal and formal methods are used to discover and locate data, including person-to-person contact. Data retrieval and storage methods are often determined by the size of the dataset and the amount of infrastructure available to the researcher. Astronomy research practices are evolving rapidly with access to more data and better tools. The poster presentation will report further on how those data are used and reused in astronomy.
Data are proliferating far faster than they can be captured, managed, or stored. What types of data are most likely to be used and reused, by whom, and for what purposes? Answers to these questions will inform information policy and the design of digital libraries. We report findings from semi-structured interviews and field observations to investigate characteristics of data use and reuse and how those characteristics vary within and between scientific communities. The two communities studied are researchers at the Center for Embedded Network Sensing (CENS)
Within information systems, a significant aspect of search and retrieval across information objects, such as datasets, journal articles, or images, relies on the identity construction of the objects. This paper uses identity to refer to the qualities or characteristics of an information object that make it definable and recognizable, and can be used to distinguish it from other objects. Identity, in this context, can be seen as the foundation from which citations, metadata and identifiers are constructed.In recent years the idea of including datasets within the scientific record has been gaining significant momentum, with publishers, granting agencies and libraries engaging with the challenge. However, the task has been fraught with questions of best practice for establishing this infrastructure, especially in regards to how citations, metadata and identifiers should be constructed. These questions suggests a problem with how dataset identities are formed, such that an engagement with the definition of datasets as conceptual objects is warranted.This paper explores some of the ways in which scientific data is an unruly and poorly bounded object, and goes on to propose that in order for datasets to fulfill the roles expected for them, the following identity functions are essential for scholarly publications: (i) the dataset is constructed as a semantically and logically concrete object, (ii) the identity of the dataset is embedded, inherent and/or inseparable, (iii) the identity embodies a framework of authorship, rights and limitations, and (iv) the identity translates into an actionable mechanism for retrieval or reference.
This article reports on the transfer of a massive scientific dataset from a national laboratory to a university library, and from one kind of workforce to another. We use the transfer of the Sloan Digital Sky Survey (SDSS) archive to examine the emergence of a new workforce for scientific research data management. Many individuals with diverse educational backgrounds and domain experience are involved in SDSS data management: domain scientists, computer scientists, software and systems engineers, programmers, and librarians. These types of positions have been described using terms such as research technologist, data scientist, e-science professional, data curator, and more. The findings reported here are based on semi-structured interviews, ethnographic participant observation, and archival studies from 2011-2013. The library staff conducting the data storage and archiving of the SDSS archive faced two performance problems. The preservation specialist and the system administrator worked together closely to discover and implement solutions to the slow data transfer and verification processes. The team overcame these slow-downs by problem solving, working in a team, and writing code. The library team lacked the astronomy domain knowledge necessary to meet some of their preservation and curation goals. The case study reveals the variety of expertise, experience, and individuals essential to the SDSS data management process. A variety of backgrounds and educational histories emerge in the data managers studied. Teamwork is necessary to bring disparate expertise together, especially between those with technical and domain education. The findings have implications for data management education, policy and relevant stakeholders. This article is part of continuing research on Knowledge Infrastructures.
We are introducing to the ASIS&T community what will be, to date, the most extensive study of data practices for astronomy and astrophysics from the Information Science field. We approach astronomy data curation with three questions: 1) What are the data management, curation, and sharing practices of astronomers and astronomy data centers, and how have they developed? 2) Who uses what data when, with whom, and why? 3) What data are most important to curate, how, for whom, and for what purposes? The first question is about what people do, how they manage data, and what counts as relevant research data to generate, use, keep, and discard. The second question addresses the social contexts, networks, and communities within which these practices occur. The third question focuses on tasks of data curation, such as deciding what data will be of future use to others, assigning responsibilities for organizing and describing datasets for use, identifying incentives and disincentives for individuals or groups to curate their data, and developing tools and services necessary to exploit those data. The poster will summarize findings from our first year of research. Our research team, based at the University of California Los Angeles' Center for Embedded Networked Sensing is part of a five‐year project, the Data Conservancy (DC), funded by the National Science Foundation's DataNet Initiative (Data Conservancy, 2010). DC's partner institutions are investigating data use, sharing, and preservation in multiple fields of science. UCLA is conducting a deep case study of astronomy and astrophysics. DC partners at Cornell, Illinois, the National Center for Atmospheric Research, and the National Snow and Ice Data Center are studying data practices in several other science domains. The DC is a research project that will offer new insights into data practices in an array of physical and life sciences. Research will be translated into practice via the design of a data repository. The poster will summarize findings from the first year of research on astronomers and astronomy data. What can the information sciences learn from astronomy about data curation? Astronomy is a pioneer in big science projects with large‐scale digital datasets. Telescopes on the ground and in orbit can stream data instantly and internationally. Large instruments can stream data directly into institutional data centers; those data may be available immediately or in periodic data releases some months later. Collaborative use of instruments has fostered the ongoing development of standards for data formats and interoperability. New technologies brought changes to the profession and to research practices accompanying data production, analysis, sharing, and preservation. The investments in shared research instruments and information technologies that characterize big science also support smaller‐scale projects by astronomers using “virtual observatories” from their offices, making data management a more personal responsibility. Astronomy offers a rich and comp...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.