While the genomes of many organisms have been sequenced over the last few years, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed that try to address one part of this problem, which consists of locating the genes along a genome. This paper reviews the existing approaches to predicting genes in eukaryotic genomes and underlines their intrinsic advantages and limitations. The main mathematical models and computational algorithms adopted are also briefly described and the resulting software classified according to both the method and the type of evidence used. Finally, the several difficulties and pitfalls encountered by the programs are detailed, showing that improvements are needed and that new directions must be considered.
We have recently described an evolutionarily conserved protein motif, designated the THAP domain, which defines a previously uncharacterized family of cellular factors (THAP proteins). The THAP domain exhibits similarities to the site-specific DNA-binding domain of Drosophila P element transposase, including a putative metal-coordinating C2CH signature (CX 2-4CX35-53CX2H). In this article, we report a comprehensive list of Ϸ100 distinct THAP proteins in model animal organisms, including human nuclear proapoptotic factors THAP1 and DAP4͞THAP0, transcriptional repressor THAP7, zebrafish orthologue of cell cycle regulator E2F6, and Caenorhabditis elegans chromatin-associated protein HIM-17 and cell-cycle regulators LIN-36 and LIN-15B. In addition, we demonstrate the biochemical function of the THAP domain as a zinc-dependent sequence-specific DNA-binding domain belonging to the zincfinger superfamily. In vitro binding-site selection allowed us to identify an 11-nucleotide consensus DNA-binding sequence specifically recognized by the THAP domain of human THAP1. Mutations of single nucleotide positions in this sequence abrogated THAP-domain binding. Experiments with the zinc chelator 1,10-ophenanthroline revealed that the THAP domain is a zinc-dependent DNA-binding domain. Site-directed mutagenesis of single cysteine or histidine residues supported a role for the C2CH motif in zinc coordination and DNA-binding activity. The four other conserved residues (P, W, F, and P), which define the THAP consensus sequence, were also found to be required for DNA binding. Together with previous genetic data obtained in C. elegans, our results suggest that cellular THAP proteins may function as zincdependent sequence-specific DNA-binding factors with roles in proliferation, apoptosis, cell cycle, chromosome segregation, chromatin modification, and transcriptional regulation.protein motif ͉ zinc finger ͉ Caenorhabditis elegans ͉ cell cycle W e have recently described an evolutionarily conserved Ϸ90-residue protein motif, designated the THAP domain, which defines a previously uncharacterized family of cellular factors, the THAP proteins (1, 2). This motif is characterized by a putative metal-coordinating C2CH module (CX 2-4 CX 35-53 CX 2 H) and four additional invariant residues, P26, W36, F58, and P78, in human THAP1 (Fig. 1). The THAP domain was found to be restricted to animals and is present in both vertebrates (from zebrafish to humans) and invertebrates (e.g., fly and worm) (1). Interestingly, the THAP-motif signature was identified (1) in the site-specific DNA-binding domain of Drosophila melanogaster P element transposase (3). This finding suggested that the THAP domain may constitute an example of a DNA-binding domain shared between cellular proteins and transposases from mobile genomic parasites and that the THAP proteins may correspond to a previously uncharacterized family of cellular DNA-binding proteins (1).In humans, the THAP family comprises 12 distinct members, including nuclear proapoptotic factor THAP1 (2), death-...
We recently cloned a novel human nuclear factor (designated THAP1) from postcapillary venule endothelial cells (ECs) that contains a DNA-binding THAP domain, shared with zebrafish E2F6 and several Caenorhabditis elegans proteins interacting genetically with retinoblastoma gene product (pRB). Here, we show that THAP1 is a physiologic regulator of EC proliferation and cell-cycle progression, 2 essential processes for angiogenesis. Retroviral-mediated gene transfer of THAP1 into primary human ECs inhibited proliferation, and large-scale expression profiling with microarrays revealed that THAP1-mediated growth inhibition is due to coordinated repression of pRB/E2F cell-cycle target genes. Silencing of endogenous THAP1 through RNA interference similarly inhibited EC proliferation and G1/S cell-cycle progression, and resulted in down-regulation of several pRB/E2F cellcycle target genes, including RRM1, a gene required for S-phase DNA synthesis. Chromatin immunoprecipitation assays in proliferating ECs showed that endogenous THAP1 associates in vivo with a consensus THAP1-binding site found in the RRM1 promoter, indicating that RRM1 is a direct transcriptional target of THAP1. The similar phenotypes observed after THAP1 overexpression and silencing suggest that an optimal range of THAP1 expression is essential for EC proliferation. Together, these data provide the first links in mammals among THAP proteins, cell proliferation, and pRB/ E2F cell-cycle pathways. (Blood. 2007; 109:584-594)
A pathosystem between Aphanomyces euteiches, the causal agent of pea root rot disease, and the model legume Medicago truncatula was developed to gain insights into mechanisms involved in resistance to this oomycete. The F83005.5 French accession and the A17-Jemalong reference line, susceptible and partially resistant, respectively, to A. euteiches, were selected for further cytological and genetic analyses. Microscopy analyses of thin root sections revealed that a major difference between the two inoculated lines occurred in the root stele, which remained pathogen free in A17. Striking features were observed in A17 roots only, including i) frequent pericycle cell divisions, ii) lignin deposition around the pericycle, and iii) accumulation of soluble phenolic compounds. Genetic analysis of resistance was performed on an F7 population of 139 recombinant inbred lines and identified a major quantitative trait locus (QTL) near the top of chromosome 3. A second study, with near-isogenic line responses to A. euteiches confirmed the role of this QTL in expression of resistance. Fine-mapping allowed the identification of a 135-kb sequenced genomic DNA region rich in proteasome-related genes. Most of these genes were shown to be induced only in inoculated A17. Novel mechanisms possibly involved in the observed partial resistance are proposed.
The PeroxiBase (http://peroxibase.toulouse.inra.fr/) is a specialized database devoted to peroxidases’ families, which are major actors of stress responses. In addition to the increasing number of sequences and the complete modification of the Web interface, new analysis tools and functionalities have been developed since the previous publication in the NAR database issue. Nucleotide sequences and graphical representation of the gene structure can now be included for entries containing genomic cross-references. An expert semi-automatic annotation strategy is being developed to generate new entries from genomic sequences and from EST libraries. Plus, new internal and automatic controls have been included to improve the quality of the entries. To compare gene structure organization among families’ members, two new tools are available, CIWOG to detect common introns and GECA to visualize gene structure overlaid with sequence conservation. The multicriteria search tool was greatly improved to allow simple and combined queries. After such requests or a BLAST search, different analysis processes are suggested, such as multiple alignments with ClustalW or MAFFT, a platform for phylogenetic analysis and GECA’s display in association with a phylogenetic tree. Finally, we updated our family specific profiles implemented in the PeroxiScan tool and made new profiles to consider new sub-families.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.