as information and communication technology has become pervasive in our society, we are increasingly dependent on both digital data and repositories that provide access to and enable the use of such resources. Repositories must earn the trust of the communities they intend to serve and demonstrate that they are reliable and capable of appropriately managing the data they hold.
The integration of heterogeneous data sources and software systems is a major issue in the biomed ical community and several approaches have been explored: linking databases, "on-the-fly" integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear "winner". Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application. Comments
The Endocrine Pancreas Consortium was formed in late 1999 to derive and sequence cDNA libraries enriched for rare transcripts expressed in the mammalian endocrine pancreas. Over the past 3 years, the Consortium has generated 20 cDNA libraries from mouse and human pancreatic tissues and deposited >150,000 sequences into the public expressed sequence tag databases. A special effort was made to enrich for cDNAs from the endocrine pancreas by constructing libraries from isolated islets. In addition, we constructed a library in which fetal pancreas from Neurogenin 3 null mice, which consists of only exocrine and duct cells, was subtracted from fetal wild-type pancreas to enrich for the transcripts from the endocrine compartment. Sequence analysis showed that these clones cluster into 9,464 assembly groups (approximating unique transcripts) for the mouse and 13,910 for the human sequences. Of these, >4,300 were unique to Consortium libraries. We have assembled a core clone set containing one cDNA for each assembly group for the mouse and have constructed the corresponding microarray, termed "PancChip 4.0," which contains >9,000 nonredundant elements. We show that this PancChip is highly enriched for genes expressed in the endocrine pancreas. The mouse and human clone sets and corresponding arrays will be important resources for diabetes research. Diabetes 52:1604 -1610, 2003 D espite recent progress in -cell biology and diabetes research, tools for the treatment of diabetes have not changed fundamentally. Although it is now clear that islet transplantation is a valuable therapeutic approach, this solution is severely limited by the shortage of islet tissue. Over the past decade, significant advances have been made toward identifying the hierarchy of transcription factors that govern pancreatic development (1). In addition, it has been shown that embryonic stem cells can be differentiated in vitro toward insulin-producing cells, although the issue remains controversial (2-4). Despite these discoveries, major obstacles to the isolation, expansion, and differentiation of pancreatic endocrine stem and/or progenitor cells exist, including a lack of appropriate cell surface antibodies for sorting of progenitor cell populations and an only rudimentary understanding of the lineage of -cells during development and regeneration of the pancreas.To accelerate the progress toward the identification of endocrine precursor cells and factors that regulate the development and differentiation of -cells, the National Institute of Diabetes and Digestive and Kidney Diseases sponsored a program entitled "Functional Genomics of the Developing Endocrine Pancreas" in 1999. The Endocrine Pancreas Consortium was created in response to this program to construct and sequence cDNA libraries derived from multiple stages of pancreatic development. Its purpose was to provide the public expressed sequence tag (EST) databases with sequences from mouse and human endocrine pancreas to discover novel transcripts that could be incorporated into custom m...
The Plasmodium falciparum Genome Database (http:// PlasmoDB.org) integrates sequence information, automated analyses and annotation data emerging from the P.falciparum genome sequencing consortium. To date, raw sequence coverage is available for >90% of the genome, and two chromosomes have been finished and annotated. Data in PlasmoDB are organized by chromosome (1-14), and can be accessed using a variety of tools for graphical and text-based browsing or downloaded in various file formats. The GUS (Genomics Unified Schema) implementation of PlasmoDB provides a multi-species genomic relational database, incorporating data from human and mouse, as well as P.falciparum. The relational schema uses a highly structured format to accommodate diverse data sets related to genomic sequence and gene expression. Tools have been designed to facilitate complex biological queries, including many that are specific to Plasmodium parasites and malaria as a disease. Additional projects seek to integrate genomic information with the rich data sets now becoming available for RNA transcription, protein expression, metabolic pathways, genetic and physical mapping, antigenic and population diversity, and phylogenetic relationships with other apicomplexan parasites. The overall goal of PlasmoDB is to facilitate Internet-and CD-ROM-based access to both finished and unfinished sequence information by the global malaria research community.
The Data Preservation Alliance for the Social Sciences (Data-PASS) is a partnership of five major U.S. institutions with a strong focus on archiving social science research. The Library of Congress supports the partnership through its National Digital Information Infrastructure and Preservation Program (NDIIPP). The goal of Data-PASS is to acquire and preserve data at risk of being lost to the research community, from opinion polls, voting records, large-scale surveys, and other social science studies. In this paper we discuss the agreements, processes, and infrastructure that provide a foundation for the collaboration. About the Partnership An international movement to archive, preserve, and share data emerged over forty years ago when digital data began to appear in volume. 1 This movement is undergoing a resurgence, as the social sciences shift anew toward a reliance on vast amounts of digital data. Still, we cannot say that even a majority of the digital social science research content created since the revolution in sample surveys and production of digital data has been preserved, nor that newly created data will be preserved. Why is this so? Many corporate and academic researchers assume that data they generate are their property and that they have limited obligations to share their data with others or to ensure its preservation. Some individual researchers are reluctant to deposit their data in archives because they fear competition. Some lack the time or expertise to prepare the metadata required for effective sharing. And some simply do not recognize the long-term value of their data. Institutional data producers may be under legal obligation to protect proprietary information. And some data just falls through the cracks. A huge quantity of digital social science research content lives on, for the moment, solely as files in the computers of individual researchers or of research institutions, or quite possibly as video tapes, floppy disks, or punchcards (etc.) in bookcases, libraries, and warehouses. If research sponsors, producers, and data curators do not take steps to preserve it, it will be lost forever. 2 It needs to be identified, located, assessed, acquired, processed, preserved, and shared. 1 For an history of the early development of this community, see Margaret O Adams, "The Origins and Early Years of IASSIST", IASSIST Quarterly 30 no. 3 (2006), 5-15. 2 The members of this partnership represent the U.S. social science data archives tradition. There are other emerging approaches to preservation, including "self"-archiving, and institutional archiving, and, more recently virtual archiving.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.