Blood cell production originates from a rare population of multipotent, self-renewing stem cells. A genome-wide gene expression analysis was performed in order to define regulatory pathways in stem cells as well as their global genetic program. Subtracted complementary DNA libraries from highly purified murine fetal liver stem cells were analyzed with bioinformatic and array hybridization strategies. A large percentage of the several thousand gene products that have been characterized correspond to previously undescribed molecules with properties suggestive of regulatory functions. The complete data, available in a biological process-oriented database, represent the molecular phenotype of the hematopoietic stem cell.
Factor VII is a vitamin K-dependent coagulation protein essential for proper hemostasis. The human Factor VII gene spans 13 kilobase pairs and is located on chromosome 13 just 2.8 kilobase pairs 5 to the Factor X gene. In this report, we show that Factor VII transcripts are restricted to the liver and that steady state levels of mRNA are much lower than those of Factor X. The major transcription start site is mapped at ؊51 by RNase protection assay and primer extension experiments. The first 185 base pairs 5 of the translation start site are sufficient to confer maximal promoter activity in HepG2 cells. Protein binding sites are identified at nucleotides ؊51 to ؊32, ؊63 to ؊58, ؊108 to ؊84, and ؊233 to ؊215 by DNase I footprint analysis and gel mobility shift assays. A liver-enriched transcription factor, hepatocyte nuclear factor-4 (HNF-4), and a ubiquitous transcription factor, Sp1, are shown to bind within the first 108 base pairs of the promoter region at nucleotide sequences ACTTTG and CCCCTCCCCC, respectively. The importance of these binding sites in promoter activity is demonstrated through independent functional mutagenesis experiments, which show dramatically reduced promoter activity. Transactivation studies with an HNF-4 expression plasmid in HeLa cells also demonstrate the importance of HNF-4 in promoting transcription in nonhepatocyte derived cells. Additionally, the sequence of a naturally occurring allele containing a previously described decanucleotide insert polymorphism at ؊323 is shown to reduce promoter activity by 33% compared with the more common allelic sequence.
To accelerate gene discovery and facilitate genetic mapping in the protozoan parasite Toxoplasma gondii, we have generated >7000 new ESTs from the 5Ј ends of randomly selected tachyzoite cDNAs. Comparison of the ESTs with the existing gene databases identified possible functions for more than 500 new T. gondii genes by virtue of sequence motifs shared with conserved protein families, including factors involved in transcription, translation, protein secretion, signal transduction, cytoskeleton organization, and metabolism. Despite this success in identifying new genes, more than 50% of the ESTs correspond to genes of unknown function, reflecting the divergent evolutionary status of this parasite. A newly recognized class of genes was identified based on its similarity to sequences known only from other members of the same phylum, therefore identifying sequences that are apparently restricted to the Apicomplexa. Such genes may underlie pathways common to this group of medically important parasites, therefore identifying potential targets for intervention.
The integration of heterogeneous data sources and software systems is a major issue in the biomed ical community and several approaches have been explored: linking databases, "on-the-fly" integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear "winner". Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application. Comments
Scientific data of importance to biologists reside in a number of different data sources, such as GenBank, GSDB, SWISS-PROT, EMBL, and OMIM, among many others. Some of these data sources are conventional databases implemented using database management systems (DBMSs) and others are structured files maintained in a number of different formats (e.g., ASN.1 and ACE). In addition, software packages such as sequence analysis packages (e.g., BLAST and FASTA) produce data and can therefore be viewed as data sources. To counter the increasing dispersion and heterogeneity of data, different approaches to integrating these data sources are appearing throughout the bioinformatics community. This paper surveys the technical challenges to integration, classifies the approaches, and critiques the available tools and methodologies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.