SOAPdenovo-Trans: <i>de novo</i> transcriptome assembly with short RNA-Seq reads

Xie, Yinlong; Wu, Gengxiong; Tang, Jingbo; Luo, Ruibang; Patterson, Jordan; Liu, Shanlin; Huang, Weihua; He, Guangzhu; Gu, Shengchang; Li, Shengkang; Zhou, Xin; Lam, Tak Wah; Li, Yingrui; Xu, Xun; Wong, Gane Ka‐Shu; Wang, Jun

doi:10.1093/bioinformatics/btu077

Cited by 829 publications

(665 citation statements)

References 18 publications

Supporting

Mentioning

645

Contrasting

Unclassified

Order By: Relevance

“…RNA-seq data derived from nine different libraries was de novo assembled using Trinity 67 and SOAPdenovo-trans 68 . For each library, the assembly with the highest N50 value was chosen to annotate the genes.…”

Section: Author Contributionsmentioning

confidence: 99%

High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development

et al. 2017

View full text Add to dashboard Cite

Using the latest sequencing and optical mapping technologies, we have produced a high-quality de novo assembly of the apple (Malus domestica Borkh.) genome. Repeat sequences, which represented over half of the assembly, provided an unprecedented opportunity to investigate the uncharacterized regions of a tree genome; we identified a new hyper-repetitive retrotransposon sequence that was over-represented in heterochromatic regions and estimated that a major burst of different transposable elements (TEs) occurred 21 million years ago. Notably, the timing of this TE burst coincided with the uplift of the Tian Shan mountains, which is thought to be the center of the location where the apple originated, suggesting that TEs and associated processes may have contributed to the diversification of the apple ancestor and possibly to its divergence from pear. Finally, genome-wide DNA methylation data suggest that epigenetic marks may contribute to agronomically relevant aspects, such as apple fruit development.

show abstract

Section: Author Contributionsmentioning

confidence: 99%

High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development

et al. 2017

View full text Add to dashboard Cite

show abstract

“…The k-mers are subsets of contiguous, overlapping nucleotides of a defined length (k) that are generated during the assembly of short-read sequences. Our initial investigations of the protein predictions provided by the 1KP Consortium, which employed 25-mers (Xie et al, 2014), resulted in limited recovery of HRGP sequences, particularly the repetitive CL-EXTs and PRPs (Johnson et al, 2017).…”

Section: Using the Maab Pipeline On 1kp Transcriptomic Data Sets: Mulmentioning

confidence: 99%

“…using targeted assembly methods or scaffolding using genomic resources). Preliminary analyses (Johnson et al, 2017) used the 1KP Consortium's k-mer = 25 assembly (Xie et al, 2014), whereas all subsequent analyses used the multiple k-mer data generated as summarized here. Sample read sets were assembled with Oases (Schulz et al, 2012) using four different k-mers (39, 49, 59, and 69) and open reading frames identified using getorf from the EMBOSS toolkit (http://emboss.sourceforge.net/).…”

Section: Data Setsmentioning

confidence: 99%

See 1 more Smart Citation

Pipeline to Identify Hydroxyproline-Rich Glycoproteins

et al. 2017

View full text Add to dashboard Cite

Intrinsically disordered proteins (IDPs) are functional proteins that lack a well-defined three-dimensional structure. The study of IDPs is a rapidly growing area as the crucial biological functions of more of these proteins are uncovered. In plants, IDPs are implicated in plant stress responses, signaling, and regulatory processes. A superfamily of cell wall proteins, the hydroxyprolinerich glycoproteins (HRGPs), have characteristic features of IDPs. Their protein backbones are rich in the disordering amino acid proline, they contain repeated sequence motifs and extensive posttranslational modifications (glycosylation), and they have been implicated in many biological functions. HRGPs are evolutionarily ancient, having been isolated from the protein-rich walls of chlorophyte algae to the cellulose-rich walls of embryophytes. Examination of HRGPs in a range of plant species should provide valuable insights into how they have evolved. Commonly divided into the arabinogalactan proteins, extensins, and proline-rich proteins, in reality, a continuum of structures exists within this diverse and heterogenous superfamily. An inability to accurately classify HRGPs leads to inconsistent gene ontologies limiting the identification of HRGP classes in existing and emerging omics data sets. We present a novel and robust motif and amino acid bias (MAAB) bioinformatics pipeline to classify HRGPs into 23 descriptive subclasses. Validation of MAAB was achieved using available genomic resources and then applied to the 1000 Plants transcriptome project (www.onekp.com) data set. Significant improvement in the detection of HRGPs using multiple-kmer transcriptome assembly methodology was observed. The MAAB pipeline is readily adaptable and can be modified to optimize the recovery of IDPs from other organisms.

show abstract

“…Specifically, raw data from sequencing experiments is either assembled (Illumina) using assemblers such as SOAPdenovo (Xie et al, 2014) or Trinity (Grabherr et al, 2011) or filtered based on the raw read quality score (454-pyrosequencing) using programs such as QTrim (Shrestha et al, 2014) or NGS QC Toolkit (Patel and Jain, 2012). In our pipeline, a stringent quality control score of 30 is used to remove low quality reads.…”

Section: Sequence Analysis Pipelinementioning

confidence: 99%

An efficient transcriptome analysis pipeline to accelerate venom peptide discovery and characterisation

Prashanth

Lewis

2015

Toxicon

View full text Add to dashboard Cite

Please cite this article as: Prashanth, J.R., Lewis, R.J., An efficient transcriptome analysis pipeline to accelerate venom peptide discovery and characterisation, Toxicon (2015), doi: 10.1016/ j.toxicon.2015.09.012. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. M A N U S C R I P T A C C E P T E D ACCEPTED MANUSCRIPT AbstractTranscriptome sequencing is now widely adopted as an efficient means to study the chemical diversity of venoms. To improve the efficiency of analysis of these large datasets, we have optimised an analysis pipeline for cone snail venom gland transcriptomes. The pipeline combines ConoSorter with sequence architecture-based elimination and similarity searching using BLAST to improve the accuracy of sequence identification and classification, while reducing requirements for manual intervention. As a proof-of-concept, we used this approach reanalysed three previously published cone snail transcriptomes from diverse dietary groups. Our pipeline method generated similar results to the published studies with significantly less manual intervention. We additionally found undiscovered sequences in the piscovorous C. geographus and vermivorous C. miles and identified sequences in incorrect superfamilies in the molluscivorus C. marmoreus and C. geographus transcriptomes. Our results indicate that this method can improve toxin detection without extending analysis time. While this method was evaluated on cone snail transcriptomes it can be easily optimised to retrieve toxins from other venomous animals.

show abstract

SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads

Abstract: Source code and user manual are available at http://sourceforge.net/projects/soapdenovotrans/.

Cited by 829 publications

References 18 publications

High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development

High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development

Pipeline to Identify Hydroxyproline-Rich Glycoproteins

An efficient transcriptome analysis pipeline to accelerate venom peptide discovery and characterisation

Contact Info

Product

Resources

About