Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.
Haptoglobin (Hp) is a plasma glycoprotein, the main biological function of which is to bind free hemoglobin (Hb) and prevent the loss of iron and subsequent kidney damage following intravascular hemolysis. Haptoglobin is also a positive acute-phase protein with immunomodulatory properties. In humans, the HP locus is polymorphic, with two codominant alleles (HP1 and HP2) that yield three distinct genotypes/phenotypes (Hp1-1, Hp2-1 and Hp2-2). The corresponding proteins have structural and functional differences that may influence the susceptibility and/or outcome in several diseases. This article summarizes the available data on the structure and functions of Hp and the possible effects of Hp polymorphism in a number of important human disorders.
Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by GENSCAN. (http:͞͞genes.mit.edu/GENSCAN.html). C omplete bacterial genome sequences allow a relatively precise and complete analysis of constituent genes and coding regions by means of direct computational analysis (1). In complex eukaryotic genomes, however, it is proving considerably more difficult to identify genes because of their fragmentation into multiple small exons divided by often considerably larger introns. In this context, the determination of the complete sequence of the human chromosome 22 allowed a detailed appraisal of the efficacy of gene prediction methodologies (2). It was noted that when known genes (where complete cDNA sequences have been determined) were compared with an ab initio prediction of the same region by using the best computational methods available, only 94% of annotated genes were detected. More importantly, in only 20% of cases were all exons exactly predicted, and 16% of all known exons were entirely missed. On the other hand, almost 40% of GENSCAN-predicted genes did not form part of any gene confirmed by other means and include an unknown proportion of false positives (2).In the absence of adequate computational approaches, gene identification will depend on the alignment of finished genomic sequence with sequences from experimentally validated transcripts. Following this approach, Dunham and colleagues (2) were able to identify 247 genes corresponding to fully sequenced transcripts on chromosome 22 that they have denominated Abbreviations: EST, expressed sequence tag; ORESTES, ORF ESTs.cc To whom reprint requests should be addressed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.