Next generation sequencing (NGS) technologies provide a high-throughput means to generate large amount of sequence data. However, quality control (QC) of sequence data generated from these technologies is extremely important for meaningful downstream analysis. Further, highly efficient and fast processing tools are required to handle the large volume of datasets. Here, we have developed an application, NGS QC Toolkit, for quality check and filtering of high-quality data. This toolkit is a standalone and open source application freely available at http://www.nipgr.res.in/ngsqctoolkit.html. All the tools in the application have been implemented in Perl programming language. The toolkit is comprised of user-friendly tools for QC of sequencing data generated using Roche 454 and Illumina platforms, and additional tools to aid QC (sequence format converter and trimming tools) and analysis (statistics tools). A variety of options have been provided to facilitate the QC at user-defined parameters. The toolkit is expected to be very useful for the QC of NGS data to facilitate better downstream analysis.
We provide a protocol for precision nuclear run-on sequencing (PRO-seq) and its variant, PRO-cap, which map the location of active RNA polymerases (PRO-seq) or transcription start sites (TSSs) (PRO-cap) genome-wide at high resolution. The density of RNA polymerases at a particular genomic locus directly reflects the level of nascent transcription at that region. Nuclei are isolated from cells and, under nuclear run-on conditions, transcriptionally engaged RNA polymerases incorporate one or, at most, a few biotin-labeled nucleotide triphosphates (biotin-NTPs) into the 3′ end of nascent RNA. The biotin-labeled nascent RNA is used to prepare sequencing libraries, which are sequenced from the 3′ end to provide high-resolution positional information for the RNA polymerases. PRO-seq provides much higher sensitivity than ChIP-seq, and it generates a much larger fraction of usable sequence reads than ChIP-seq or NET-seq (native elongating transcript sequencing). Similarly to NET-seq, PRO-seq maps the RNA polymerase at up to base-pair resolution with strand specificity, but unlike NET-seq it does not require immunoprecipitation. With the protocol provided here, PRO-seq (or PRO-cap) libraries for high-throughput sequencing can be generated in 4–5 working days. The method has been applied to human, mouse, Drosophila melanogaster and Caenorhabditis elegans cells and, with slight modifications, to yeast.
SUMMARYCicer arietinum L. (chickpea) is the third most important food legume crop. We have generated the draft sequence of a desi-type chickpea genome using next-generation sequencing platforms, bacterial artificial chromosome end sequences and a genetic map. The 520-Mb assembly covers 70% of the predicted 740-Mb genome length, and more than 80% of the gene space. Genome analysis predicts the presence of 27 571 genes and 210 Mb as repeat elements. The gene expression analysis performed using 274 million RNA-Seq reads identified several tissue-specific and stress-responsive genes. Although segmental duplicated blocks are observed, the chickpea genome does not exhibit any indication of recent whole-genome duplication. Nucleotide diversity analysis provides an assessment of a narrow genetic base within the chickpea cultivars. We have developed a resource for genetic markers by comparing the genome sequences of one wild and three cultivated chickpea genotypes. The draft genome sequence is expected to facilitate genetic enhancement and breeding to develop improved chickpea varieties.
Chickpea ranks third among the food legume crops production in the world. However, the genomic resources available for chickpea are still very limited. In the present study, the transcriptome of chickpea was sequenced with short reads on Illumina Genome Analyzer platform. We have assessed the effect of sequence quality, various assembly parameters and assembly programs on the final assembly output. We assembled ∼107million high-quality trimmed reads using Velvet followed by Oases with optimal parameters into a non-redundant set of 53 409 transcripts (≥100 bp), representing about 28 Mb of unique transcriptome sequence. The average length of transcripts was 523 bp and N50 length of 900 bp with coverage of 25.7 rpkm (reads per kilobase per million). At the protein level, a total of 45 636 (85.5%) chickpea transcripts showed significant similarity with unigenes/predicted proteins from other legumes or sequenced plant genomes. Functional categorization revealed the conservation of genes involved in various biological processes in chickpea. In addition, we identified simple sequence repeat motifs in transcripts. The chickpea transcripts set generated here provides a resource for gene discovery and development of functional molecular markers. In addition, the strategy for de novo assembly of transcriptome data presented here will be helpful in other similar transcriptome studies.
Chickpea (Cicer arietinum) is an important food legume crop but lags in the availability of genomic resources. In this study, we have generated about 2 million high-quality sequences of average length of 372 bp using pyrosequencing technology. The optimization of de novo assembly clearly indicated that hybrid assembly of long-read and short-read primary assemblies gave better results. The hybrid assembly generated a set of 34,760 transcripts with an average length of 1,020 bp representing about 4.8% (35.5 Mb) of the total chickpea genome. We identified more than 4,000 simple sequence repeats, which can be developed as functional molecular markers in chickpea. Putative function and Gene Ontology terms were assigned to at least 73.2% and 71.0% of chickpea transcripts, respectively. We have also identified several chickpea transcripts that showed tissue-specific expression and validated the results using real-time polymerase chain reaction analysis. Based on sequence comparison with other species within the plant kingdom, we identified two sets of lineage-specific genes, including those conserved in the Fabaceae family (legume specific) and those lacking significant similarity with any non chickpea species (chickpea specific). Finally, we have developed a Web resource, Chickpea Transcriptome Database, which provides public access to the data and results reported in this study. The strategy for optimization of de novo assembly presented here may further facilitate the transcriptome sequencing and characterization in other organisms. Most importantly, the data and results reported in this study will help to accelerate research in various areas of genomics and implementing breeding programs in chickpea.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.