BackgroundSeveral recent studies have demonstrated the use of Roche 454 sequencing technology for de novo transcriptome analysis. Low error rates and high coverage also allow for effective SNP discovery and genetic diversity estimates. However, genetically diverse datasets, such as those sourced from natural populations, pose challenges for assembly programs and subsequent analysis. Further, estimating the effectiveness of transcript discovery using Roche 454 transcriptome data is still a difficult task.ResultsUsing the Roche 454 FLX Titanium platform, we sequenced and assembled larval transcriptomes for two butterfly species: the Propertius duskywing, Erynnis propertius (Lepidoptera: Hesperiidae) and the Anise swallowtail, Papilio zelicaon (Lepidoptera: Papilionidae). The Expressed Sequence Tags (ESTs) generated represent a diverse sample drawn from multiple populations, developmental stages, and stress treatments.Despite this diversity, > 95% of the ESTs assembled into long (> 714 bp on average) and highly covered (> 9.6× on average) contigs. To estimate the effectiveness of transcript discovery, we compared the number of bases in the hit region of unigenes (contigs and singletons) to the length of the best match silkworm (Bombyx mori) protein--this "ortholog hit ratio" gives a close estimate on the amount of the transcript discovered relative to a model lepidopteran genome. For each species, we tested two assembly programs and two parameter sets; although CAP3 is commonly used for such data, the assemblies produced by Celera Assembler with modified parameters were chosen over those produced by CAP3 based on contig and singleton counts as well as ortholog hit ratio analysis. In the final assemblies, 1,413 E. propertius and 1,940 P. zelicaon unigenes had a ratio > 0.8; 2,866 E. propertius and 4,015 P. zelicaon unigenes had a ratio > 0.5.ConclusionsUltimately, these assemblies and SNP data will be used to generate microarrays for ecoinformatics examining climate change tolerance of different natural populations. These studies will benefit from high quality assemblies with few singletons (less than 26% of bases for each assembled transcriptome are present in unassembled singleton ESTs) and effective transcript discovery (over 6,500 of our putative orthologs cover at least 50% of the corresponding model silkworm gene).
BackgroundTranscriptome sequencing and assembly represent a great resource for the study of non-model species, and many metrics have been used to evaluate and compare these assemblies. Unfortunately, it is still unclear which of these metrics accurately reflect assembly quality.ResultsWe simulated sequencing transcripts of Drosophila melanogaster. By assembling these simulated reads using both a “perfect” and a modern transcriptome assembler while varying read length and sequencing depth, we evaluated quality metrics to determine whether they 1) revealed perfect assemblies to be of higher quality, and 2) revealed perfect assemblies to be more complete as data quantity increased.Several commonly used metrics were not consistent with these expectations, including average contig coverage and length, though they became consistent when singletons were included in the analysis. We found several annotation-based metrics to be consistent and informative, including contig reciprocal best hit count and contig unique annotation count. Finally, we evaluated a number of novel metrics such as reverse annotation count, contig collapse factor, and the ortholog hit ratio, discovering that each assess assembly quality in unique ways.ConclusionsAlthough much attention has been given to transcriptome assembly, little research has focused on determining how best to evaluate assemblies, particularly in light of the variety of options available for read length and sequencing depth. Our results provide an important review of these metrics and give researchers tools to produce the highest quality transcriptome assemblies.
SummaryThe developmental programme of grape berries within a cluster is coordinated to synchronize their ripening. Altered transcriptional events and metabolite accumulation are responsible for the differential progress of ripening.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.