Abstract. The paper outlines an experiment conducted in two different academic environments, in which FIT tests were used as a functional requirements specification. Common challenges for functional requirements specifications are identified, and a comparison is made between how well prose and FIT user acceptance tests are suited to overcoming these challenges from the developer's perspective. Experimental data and participant feedback are examined to evaluate whether developers can use requirements in the form of FIT tests to create a design and implementation.
This paper previews the imminent flood of scientific data expected from the next generation of experiments, simulations, sensors and satellites. In order to be exploited by search engines and data mining software tools, such experimental data needs to be annotated with relevant metadata giving information as to provenance, content, conditions and so on. The need to automate the process of going from raw data to information to knowledge is briefly discussed. The paper argues the case for creating new types of digital libraries for scientific data with the same sort of management services as conventional digital libraries in addition to other data-specific services. Some likely implications of both the Open Archives Initiative and e-Science data for the future role for university libraries are briefly mentioned. A substantial subset of this e-Science data needs to archived and curated for long-term preservation. Some of the issues involved in the digital preservation of both scientific data and of the programs needed to interpret the data are reviewed. Finally, the implications of this wealth of e-Science data for the Grid middleware infrastructure are highlighted.
ObjectiveThis study informs efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by research funded by the U.S. National Institutes of Health (NIH). It focuses on those datasets that are “invisible” or not deposited in a known repository.MethodsWe analyzed NIH-funded journal articles that were published in 2011, cited in PubMed and deposited in PubMed Central (PMC) to identify those that indicate data were submitted to a known repository. After excluding those articles, we analyzed a random sample of the remaining articles to estimate how many and what types of invisible datasets were used in each article.ResultsAbout 12% of the articles explicitly mention deposition of datasets in recognized repositories, leaving 88% that are invisible datasets. Among articles with invisible datasets, we found an average of 2.9 to 3.4 datasets, suggesting there were approximately 200,000 to 235,000 invisible datasets generated from NIH-funded research published in 2011. Approximately 87% of the invisible datasets consist of data newly collected for the research reported; 13% reflect reuse of existing data. More than 50% of the datasets were derived from live human or non-human animal subjects.ConclusionIn addition to providing a rough estimate of the total number of datasets produced per year by NIH-funded researchers, this study identifies additional issues that must be addressed to improve the discoverability of and access to biomedical research data: the definition of a “dataset,” determination of which (if any) data are valuable for archiving and preservation, and better methods for estimating the number of datasets of interest. Lack of consensus amongst annotators about the number of datasets in a given article reinforces the need for a principled way of thinking about how to identify and characterize biomedical datasets.
BackgroundThe New York University Health Sciences Library data services team had developed educational material for research data management and data visualization and had been offering classes at the request of departments, research groups, and training programs, but many members of the medical center were unaware of these library data services. There were also indications of data skills gaps in these subject areas and other data-related topics.Case PresentationThe data services team enlisted instructors from across the medical center with data expertise to teach in a series of classes hosted by the library. We hosted eight classes branded as a series called “Data Day to Day.” Seven instructors from four units in the medical center, including the library, taught the classes. A multipronged outreach approach resulted in high turnout. Evaluations indicated that attendees were very satisfied with the instruction, would use the skills learned, and were interested in future classes.ConclusionsData Day to Day met previously unaddressed data skills gaps. Collaborating with outside instructors allowed the library to serve as a hub for a broad range of data instruction and to raise awareness of library services. We plan to offer the series three times in the coming year with an expanding roster of classes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.