BackgroundMicrobiome studies often involve sequencing a marker gene to identify the microorganisms in samples of interest. Sequence classification is a critical component of this process, whereby sequences are assigned to a reference taxonomy containing known sequence representatives of many microbial groups. Previous studies have shown that existing classification programs often assign sequences to reference groups even if they belong to novel taxonomic groups that are absent from the reference taxonomy. This high rate of “over classification” is particularly detrimental in microbiome studies because reference taxonomies are far from comprehensive.ResultsHere, we introduce IDTAXA, a novel approach to taxonomic classification that employs principles from machine learning to reduce over classification errors. Using multiple reference taxonomies, we demonstrate that IDTAXA has higher accuracy than popular classifiers such as BLAST, MAPSeq, QIIME, SINTAX, SPINGO, and the RDP Classifier. Similarly, IDTAXA yields far fewer over classifications on Illumina mock microbial community data when the expected taxa are absent from the training set. Furthermore, IDTAXA offers many practical advantages over other classifiers, such as maintaining low error rates across varying input sequence lengths and withholding classifications from input sequences composed of random nucleotides or repeats.ConclusionsIDTAXA’s classifications may lead to different conclusions in microbiome studies because of the substantially reduced number of taxa that are incorrectly identified through over classification. Although misclassification error is relatively minor, we believe that many remaining misclassifications are likely caused by errors in the reference taxonomy. We describe how IDTAXA is able to identify many putative mislabeling errors in reference taxonomies, enabling training sets to be automatically corrected by eliminating spurious sequences. IDTAXA is part of the DECIPHER package for the R programming language, available through the Bioconductor repository or accessible online (http://DECIPHER.codes).Electronic supplementary materialThe online version of this article (10.1186/s40168-018-0521-5) contains supplementary material, which is available to authorized users.
Lipidomics has great promise in various applications; however, a major bottleneck in lipidomics is the accurate and comprehensive annotation of high-resolution tandem mass spectral data. While the number of available lipidomics software has drastically increased over the past five years, the reduction of false positives and the realization of obtaining structurally accurate annotations remains a significant challenge. We introduce Lipid Annotator, which is a user-friendly software for lipidomic analysis of data collected by liquid chromatography high-resolution tandem mass spectrometry (LC-HRMS/MS). We validate annotation accuracy against lipid standards and other lipidomics software. Lipid Annotator was integrated into a workflow applying an iterative exclusion MS/MS acquisition strategy to National Institute of Standards and Technology (NIST) SRM 1950 Metabolites in Frozen Human Plasma using reverse phase LC-HRMS/MS. Lipid Annotator, LipidMatch, and MS-DIAL produced consensus annotations at the level of lipid class for 98% and 96% of features detected in positive and negative mode, respectively. Lipid Annotator provides percentages of fatty acyl constituent species and employs scoring algorithms based on probability theory, which is less subjective than the tolerance and weighted match scores commonly used by available software. Lipid Annotator enables analysis of large sample cohorts and improves data-processing throughput as compared to previous lipidomics software.
Spatial control of chemical reactions, with micro- and nanometer scale resolution, has important consequences for one pot synthesis, engineering complex reactions, developmental biology, cellular biochemistry and emergent behavior. We review synthetic methods to engineer this spatial control using chemical diffusion from spherical particles, shells and polyhedra. We discuss systems that enable both isotropic and anisotropic chemical release from isolated and arrayed particles to create inhomogeneous and spatially patterned chemical fields. In addition to such finite chemical sources, we also discuss spatial control enabled with laminar flow in 2D and 3D microfluidic networks. Throughout the paper, we highlight applications of spatially controlled chemistry in chemical kinetics, reaction-diffusion systems, chemotaxis and morphogenesis.
We identify a decidable synthesis problem for a class of programs of unbounded size with conditionals and iteration that work over infinite data domains. The programs in our class use uninterpreted functions and relations, and abide by a restriction called coherence that was recently identified to yield decidable verification. We formulate a powerful grammar-restricted (syntax-guided) synthesis problem for coherent uninterpreted programs, and we show the problem to be decidable, identify its precise complexity, and also study several variants of the problem.
We investigate the decidability of automatic program verification for programs that manipulate heaps, and in particular, decision procedures for proving memory safety for them. We extend recent work that identified a decidable subclass of uninterpreted programs to a class of alias-aware programs that can update maps. We apply this theory to develop verification algorithms for memory safetyÐ determining if a heap-manipulating program that allocates and frees memory locations and manipulates heap pointers does not dereference an unallocated memory location. We show that this problem is decidable when the initial allocated heap forms a forest data-structure and when programs are streaming-coherent, which intuitively restricts programs to make a single pass over a data-structure. Our experimental evaluation on a set of library routines that manipulate forest data-structures shows that common single-pass algorithms on data-structures often fall in the decidable class, and that our decision procedure is efficient in verifying them.Deciding Memory Safety for Single-Pass Heap-Manipulating Programs 35:3 must either be the case that x is different from z in any data-model/heap or it must be the case that x is equal to z in all data-models/heaps.We show that alias-awareness is a panacea for our problems. For alias-aware programs (programs whose executions are all alias-aware), we show we can associate terms with variables after a computation that updates maps, and further show that the notion of coherence extends naturally to programs that update maps. We then show that for coherent alias-aware programs, the verification problem becomes decidable. These results constitute the first main contribution of the paper. Application to Verifying Memory SafetyWe then study the application of our framework to verifying memory safety. Our key observation is that for programs that manipulate forest data-structures (data-structures consisting of disjoint tree-like structures), programs are naturally alias-aware. Intuitively, when traversing forest datastructures, aliasing information is implicitly present. For instance, if x points to a location of a forest data-structure, we know that the location pointed to by x, the one pointed to by the left child x·left, and the one pointed to by the right child x·right are all different.In this paper, we define memory safety as follows. A heap-manipulating program starts with a set of allocated heap locations. During its execution, it dereferences pointers on heap locations, and allocates and frees locations. A program is memory safe if it never dereferences a location that is not in the allocated set. The above definition of memory safety captures the usual categories of memory safety errors such as null-pointer dereferences, use after free, use of uninitialized memory, illegal freeing of memory, etc. [Hicks 2014]. However, in this paper, we do not consider allocation of contiguous blocks of arbitrary size of memory (and hence do not handle arrays and buffer overflows of arrays in languages like ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.