Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions.Availability: http://cd-hit.org.Contact: liwz@sdsc.eduSupplementary information: Supplementary data are available at Bioinformatics online.
BackgroundThe new field of metagenomics studies microorganism communities by culture-independent sequencing. With the advances in next-generation sequencing techniques, researchers are facing tremendous challenges in metagenomic data analysis due to huge quantity and high complexity of sequence data. Analyzing large datasets is extremely time-consuming; also metagenomic annotation involves a wide range of computational tools, which are difficult to be installed and maintained by common users. The tools provided by the few available web servers are also limited and have various constraints such as login requirement, long waiting time, inability to configure pipelines etc.ResultsWe developed WebMGA, a customizable web server for fast metagenomic analysis. WebMGA includes over 20 commonly used tools such as ORF calling, sequence clustering, quality control of raw reads, removal of sequencing artifacts and contaminations, taxonomic analysis, functional annotation etc. WebMGA provides users with rapid metagenomic data analysis using fast and effective tools, which have been implemented to run in parallel on our local computer cluster. Users can access WebMGA through web browsers or programming scripts to perform individual analysis or to configure and run customized pipelines. WebMGA is freely available at http://weizhongli-lab.org/metagenomic-analysis.ConclusionsWebMGA offers to researchers many fast and unique tools and great flexibility for complex metagenomic data analysis.
In addition to protein-coding genes, the human genome makes a large amount of noncoding RNAs, including microRNAs and long noncoding RNAs (lncRNAs). Both microRNAs and lncRNAs have been shown to have a critical role in the regulation of cellular processes such as cell growth and apoptosis, as well as cancer progression and metastasis. Although it is well known that microRNAs can target a large number of protein-coding genes, little is known whether microRNAs can also target lncRNAs. In the present study, we determine whether miR-21 can regulate lncRNA expression. Using the lncRNA RT-PCR (reverse transcription-polymerase chain reaction) array carrying 83 human disease-related lncRNAs, we show that miR-21 is capable of suppressing the lncRNA growth arrest-specific 5 (GAS5). This negative correlation between miR-21 and GAS5 is also seen in breast tumor specimens. Of interest, GAS5 can also repress miR-21 expression. Whereas ectopic expression of GAS5 suppresses, GAS5-siRNA increases miR-21 expression. Importantly, there is a putative miR-21-binding site in exon 4 of GAS5; deletion of the miR-21-binding site abolishes this activity. Experiments with in vitro cell culture and xenograft mouse model suggest that GAS5 functions as a tumor suppressor. We further show that the biotin-labeled GAS5-RNA probe is able to pull down the key component (AGO2) of the RNA-induced silencing complex (RISC) and we subsequently identify miR-21 in this GAS5-RISC complex, implying that miR-21 and GAS5 may regulate each other in a way similar to the microRNA-mediated silencing of target mRNAs. Together, these results suggest that miR-21 targets not only tumor-suppressive protein-coding genes but also lncRNA GAS5.
Traditional approaches to protein-protein docking sample the binding modes with no regard to similar experimentally determined structures (templates) of protein-protein complexes. Emerging template-based docking approaches utilize such similar complexes to determine the docking predictions. The docking problem assumes the knowledge of the participating proteins' structures. Thus, it provides the possibility of aligning the structures of the proteins and the template complexes. The progress in the development of template-based docking and the vast experience in template-based modeling of individual proteins show that, generally, such approaches are more reliable than the free modeling. The key aspect of this modeling paradigm is the availability of the templates. The current common perception is that due to the difficulties in experimental structure determination of protein-protein complexes, the pool of docking templates is insignificant, and thus a broad application of template-based docking is possible only at some future time. The results of our large scale, systematic study show that, surprisingly, in spite of the limited number of proteinprotein complexes in the Protein Data Bank, docking templates can be found for complexes representing almost all the known proteinprotein interactions, provided the components themselves have a known structure or can be homology-built. About one-third of the templates are of good quality when they are compared to experimental structures in test sets extracted from the Protein Data Bank and would be useful starting points in modeling the complexes. This finding dramatically expands our ability to model protein interactions, and has far-reaching implications for the protein docking field in general.protein modeling | protein recognition | structural bioinformatics | structure alignment P rotein-protein interactions (PPI) are a key component of life processes at the molecular level, and the number detected in genome-wide studies is fast growing. We want to understand their properties and be able to manipulate them for structure-based drug design. For this purpose, we must characterize PPI structurally, but their study by X-ray and NMR methods is demanding and slow, and computational methods appear to be a necessary complement.The structural predictions of PPI generally rely on docking procedures that can be roughly divided into: (i) template-free docking, where many or all the possible binding modes of two proteins are explored with no a priori knowledge of the structure of the complex, and (ii) template-based docking, where the similarity with previously known complexes determines the prediction. Template-free docking methods rely on the geometric and chemicalphysical complementarity of the protein surfaces (1), now often supplemented by statistical potentials (2, 3), and subject to a variety of constraints (4). The template-free modeling can also be applied to prediction of domain-domain structures (5, 6). The Critical Assessment of Predicted Interactions (CAPRI) blind predi...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.