We present preliminary findings from a threeyear research project comprised of longitudinal qualitative case studies of data practices in four large, distributed, highly multidisciplinary scientific collaborations. This project follows a 2 × 2 research design: two of the collaborations are big science while two are little science, two have completed data collection activities while two are ramping up data collection. This paper is centered on one of these collaborations, a project bringing together scientists to study subseafloor microbial life. This collaboration is little science, characterized by small teams, using small amounts of data, to address specific questions. Our case study employs participant observation in a laboratory, interviews (n = 49 to date) with scientists in the collaboration, and document analysis. We present a data workflow that is typical for many of the scientists working in the observed laboratory. In particular, we show that, although this workflow results in datasets apparently similar in form, nevertheless a large degree of heterogeneity exists across scientists in this laboratory in terms of the methods they employ to produce these datasets-even between scientists working on adjacent benches. To date, most studies of data in little science focus on heterogeneity in terms of the types of data produced: this paper adds another dimension of heterogeneity to existing knowledge about data in little science. This additional dimension makes more complex the task of management and curation of data for subsequent reuse. Furthermore, the nature of the factors that contribute to heterogeneity of methods suggest that this dimension of
Scientists in all fields face challenges in managing and sustaining access to their research data. The larger and longer term the research project, the more likely that scientists are to have resources and dedicated staff to manage their technology and data, leaving those scientists whose work is based on smaller and shorter term projects at a disadvantage. The volume and variety of data to be managed varies by many factors, only two of which are the number of collaborators and length of the project. As part of an NSF project to conceptualize the Institute for Empowering Long Tail Research, we explored opportunities offered by Software as a Service (SaaS). These cloud-based services are popular in business because they reduce costs and labor for technology management, and are gaining ground in scientific environments for similar reasons. We studied three settings where scientists conduct research in small and medium-sized laboratories. Two were NSF Science and Technology Centers (CENS and C-DEBI) and the third was a workshop of natural reserve scientists and managers. These laboratories have highly diverse data and practices, make minimal use of standards for data or metadata, and lack resources for data management or sustaining access to their data, despite recognizing the need. We found that SaaS could address technical needs for basic document creation, analysis, and storage, but did not support the diverse and rapidly changing needs for sophisticated domain-specific tools and services. These are much more challenging knowledge infrastructure requirements that require long-term investments by multiple stakeholders.
This article discusses the burgeoning “collections as data” movement within the fields of digital libraries and digital humanities. Faculty at the University of Utah’s Marriott Library are developing a collections as data strategy by leveraging existing Digital Library and Digital Matters programs. By selecting various digital collections, small- and large-scale approaches to developing open datasets are explored. Five case studies chronicling this strategy are reviewed, along with testing the datasets using various digital humanities methods, such as text mining, topic modeling, and GIS (geographic information system).
Purpose This paper aims to determine if the digital humanities technique of topic modeling would reveal interesting patterns in a corpus of library-themed literature focused on the future of libraries and pioneer a collaboration model in librarian-led digital humanities projects. By developing the project, librarians learned how to better support digital humanities by actually doing digital humanities, as well as gaining insight on the variety of approaches taken by researchers and commenters to the idea of the future of libraries. Design/methodology/approach The researchers collected a corpus of over 150 texts (articles, blog posts, book chapters, websites, etc.) that all addressed the future of the library. They ran several instances of latent Dirichlet allocation style topic modeling on the corpus using the programming language R. Once they produced a run in which the topics were cohesive and discrete, they produced word-clouds of the words associated with each topic, visualized topics through time and examined in detail the top five documents associated with each topic. Findings The research project provided an effective way for librarians to gain practical experience in digital humanities and develop a greater understanding of collaborative workflows in digital humanities. By examining a corpus of library-themed literature, the researchers gained new insight into how the profession grapples with the idea of the future and an appreciation for topic modeling as a form of literature review. Originality/value Topic modeling a future-themed corpus of library literature is a unique research project and provides a way to support collaboration between library faculty and researchers from outside the library.
Discovery is the researcher's dream. The dream of a straightforward search that allows information seekers to find the content they are looking for and, more importantly, relevant content they do not yet know about. Librarians, system vendors and content providers aim to materialize this dream of efficient and accurate discovery motivated by rationales that vary from the noble goals of knowledge creation and sharing to profit-driven commercial grounds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.