2012
DOI: 10.1186/1471-2164-13-92
|View full text |Cite
|
Sign up to set email alerts
|

Cutoffs and k-mers: implications from a transcriptome study in allopolyploid plants

Abstract: BackgroundTranscriptome analysis is increasingly being used to study the evolutionary origins and ecology of non-model plants. One issue for both transcriptome assembly and differential gene expression analyses is the common occurrence in plants of hybridisation and whole genome duplication (WGD) and hybridization resulting in allopolyploidy. The divergence of duplicated genes following WGD creates near identical homeologues that can be problematic for de novo assembly and also reference based assembly protoco… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
62
0
1

Year Published

2014
2014
2023
2023

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 57 publications
(65 citation statements)
references
References 52 publications
2
62
0
1
Order By: Relevance
“…The final dataset used for de novo transcriptome assembly was 38.9% of its original size. The number of contigs in each assembly, spanning k-mers 33 to 89, was inversely related to k-mer size (Table 2), which is consistent with previous observations (17,18,21,28). The optimum k-mer sizes appeared to be 57 and 65, which had the highest values for most metrics typically used to assess assembly quality (e.g., mean and median contig lengths, N50, and N90), maximum contig length the exception.…”
Section: Resultssupporting
confidence: 88%
“…The final dataset used for de novo transcriptome assembly was 38.9% of its original size. The number of contigs in each assembly, spanning k-mers 33 to 89, was inversely related to k-mer size (Table 2), which is consistent with previous observations (17,18,21,28). The optimum k-mer sizes appeared to be 57 and 65, which had the highest values for most metrics typically used to assess assembly quality (e.g., mean and median contig lengths, N50, and N90), maximum contig length the exception.…”
Section: Resultssupporting
confidence: 88%
“…Removing redundancies from the 1KP sequence data was necessary because, like other short-read sequence data, these offers a wealth of information, but redundancy can occur from multiple sources, including splice variants, sequencing errors, recent duplications, chimeric transcripts, and polyploidy (Gruenheit et al, 2012;Yang and Smith, 2013;Xie et al, 2014). Sequence redundancies were obvious in our analysis (Supplemental Fig.…”
Section: Reducing Redundancy In 1kp Data Setsmentioning
confidence: 96%
“…Initially, the assembled transcriptome was highly enriched for small-sized transcripts, similar to other de novoassembled plant transcriptomes, which could in part be due to the transcript assembly algorithm, where the reads are decomposed into k-mers that may report hypothetical and misassembled transcripts (Zhao et al, 2011;Gruenheit et al, 2012). Filtering out transcripts based on abundance estimation and clustering by sequence identity would have likely removed these assembly artifacts, including transcript variants that were poorly supported by the reads.…”
Section: Transcriptome Assembly and Annotationmentioning
confidence: 99%