BackgroundAdapter trimming is a prerequisite step for analyzing next-generation sequencing (NGS) data when the reads are longer than the target DNA/RNA fragments. Although typically used in small RNA sequencing, adapter trimming is also used widely in other applications, such as genome DNA sequencing and transcriptome RNA/cDNA sequencing, where fragments shorter than a read are sometimes obtained because of the limitations of NGS protocols. For the newly emerged Nextera long mate-pair (LMP) protocol, junction adapters are located in the middle of all properly constructed fragments; hence, adapter trimming is essential to gain the correct paired reads. However, our investigations have shown that few adapter trimming tools meet both efficiency and accuracy requirements simultaneously. The performances of these tools can be even worse for paired-end and/or mate-pair sequencing.ResultsTo improve the efficiency of adapter trimming, we devised a novel algorithm, the bit-masked k-difference matching algorithm, which has O(kn) expected time with O(m) space, where k is the maximum number of differences allowed, n is the read length, and m is the adapter length. This algorithm makes it possible to fully enumerate all candidates that meet a specified threshold, e.g. error ratio, within a short period of time. To improve the accuracy of this algorithm, we designed a simple and easy-to-explain statistical scoring scheme to evaluate candidates in the pattern matching step. We also devised scoring schemes to fully exploit the paired-end/mate-pair information when it is applicable. All these features have been implemented in an industry-standard tool named Skewer (https://sourceforge.net/projects/skewer). Experiments on simulated data, real data of small RNA sequencing, paired-end RNA sequencing, and Nextera LMP sequencing showed that Skewer outperforms all other similar tools that have the same utility. Further, Skewer is considerably faster than other tools that have comparative accuracies; namely, one times faster for single-end sequencing, more than 12 times faster for paired-end sequencing, and 49% faster for LMP sequencing.ConclusionsSkewer achieved as yet unmatched accuracies for adapter trimming with low time bound.
A fast, accurate, and full indexing of viruses and viroids in a sample for the inspection and quarantine services and disease management is desirable but was unrealistic until recently. This article reviews the rapid and exciting recent progress in the use of next-generation sequencing (NGS) technologies for the identification of viruses and viroids in plants. A total of four viroids/viroid-like RNAs and 49 new plant RNA and DNA viruses from 18 known or unassigned virus families have been identified from plants since 2009. A comparison of enrichment strategies reveals that full indexing of RNA and DNA viruses as well as viroids in a plant sample at single-nucleotide resolution is made possible by one NGS run of total small RNAs, followed by data mining with homology-dependent and homology-independent computational algorithms. Major challenges in the application of NGS technologies to pathogen discovery are discussed.
Orchidaceae (orchids) is the largest family in the monocots, including about 25,000 species in 880 genera and five subfamilies. Many orchids are highly valued for their beautiful and long-lasting flowers. However, the phylogenetic relationships among the five orchid subfamilies remain unresolved. The major dispute centers on whether the three one-stamened subfamilies, Epidendroideae, Orchidoideae, and Vanilloideae, are monophyletic or paraphyletic. Moreover, structural changes in the plastid genome (plastome) and the effective genetic loci at the species-level phylogenetics of orchids have rarely been documented. In this study, we compared 53 orchid plastomes, including four newly sequenced ones, that represent four remote genera: Dendrobium, Goodyera, Paphiopedilum, and Vanilla. These differ from one another not only in their lengths of inverted repeats and small single copy regions but also in their retention of ndh genes. Comparative analyses of the plastomes revealed that the expansion of inverted repeats in Paphiopedilum and Vanilla is associated with a loss of ndh genes. In orchid plastomes, mutational hotspots are genus specific. After having carefully examined the data, we propose that the three loci 5′trnK-rps16, trnS-trnG, and rps16-trnQ might be powerful markers for genera within Epidendroideae, and clpP-psbB and rps16-trnQ might be markers for genera within Cypripedioideae. After analyses of a partitioned dataset, we found that our plastid phylogenomic trees were congruent in a topology where two one-stamened subfamilies (i.e., Epidendroideae and Orchidoideae) were sisters to a multi-stamened subfamily (i.e., Cypripedioideae) rather than to the other one-stamened subfamily (Vanilloideae), suggesting that the living one-stamened orchids are paraphyletic.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.