Despite 25 years of progress in understanding the molecular mechanisms of olfaction, it is still not possible to predict whether a given molecule will have a perceived odor, or what olfactory percept it will produce. To address this stimulus-percept problem for olfaction, we organized the crowd-sourced DREAM Olfaction Prediction Challenge. Working from a large olfactory psychophysical dataset, teams developed machine learning algorithms to predict sensory attributes of molecules based on their chemoinformatic features. The resulting models predicted odor intensity and pleasantness with high accuracy, and also successfully predicted eight semantic descriptors ("garlic", "fish", "sweet", "fruit," "burnt", "spices", "flower", "sour").Regularized linear models performed nearly as well as random-forest-based approaches, with a predictive accuracy that closely approaches a key theoretical limit. The models presented here make it possible to predict the perceptual qualities of virtually any molecule with an impressive degree of accuracy to reverse-engineer the smell of a molecule. One Sentence Summary:Results of a crowdsourcing competition show that it is possible to accurately predict and reverse-engineer the smell of a molecule. Main Text:In vision and hearing, the wavelength of light and frequency of sound are highly predictive of color and tone. In contrast, it is not currently possible to predict the smell of a molecule from its chemical structure (1, 2). This stimulus-percept problem has been difficult to solve in olfaction because odor stimuli do not vary continuously in stimulus space, and the size and dimensionality of olfactory perceptual space is unknown (1, 3,4). Some molecules with very similar chemical structures can be discriminated by humans (5, 6), and molecules with very different structures sometimes produce nearly identical percepts (2). Recent computational efforts developed models to relate chemical structure to odor percept (2,(7)(8)(9)(10)(11), but many relied on psychophysical data from a single 30-year-old study that used odorants with limited structural and perceptual diversity (12, 13).Twenty-two teams competing in the DREAM Olfaction Prediction Challenge (14) were given a large, unpublished psychophysical dataset collected by Keller and Vosshall from 49 individuals who profiled 476 structurally and perceptually diverse molecules, including those that are unfamiliar, unpleasant, or nearly odorless (15) (Fig. 1a). We supplied 4884 physicochemical features of each of the molecules smelled by the subjects, including atom types, . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/082495 doi: bioRxiv preprint first posted online Oct. 21, 2016; 3 functional groups, and topological and geometrical properties that were computed using Dragon chemoinformatic software (version 6) (Fig. 1b).Using a baseline linear model developed for the challenge and inspi...
High-throughput DNA sequencing has enabled us to look beyond consensus reference sequences to the variation observed in sequences within organisms; their haplotypes. Recovery, or assembly of haplotypes has proved computationally difficult and there exist many probabilistic heuristics that attempt to recover the original haplotypes for a single organism of known ploidy. However, existing approaches make simplifications or assumptions that are easily violated when investigating sequence variation within a metagenome.We propose the metahaplome as the set of haplotypes for any particular genomic region of interest within a metagenomic data set and present Hansel and Gretel, a data structure and algorithm that together provide a proof of concept framework for the recovery of true haplotypes from a metagenomic data set. The algorithm performs incremental haplotype recovery, using smoothed Naive Bayes -a simple, efficient and effective method.Hansel and Gretel pose several advantages over existing solutions: the framework is capable of recovering haplotypes from metagenomes, does not require a priori knowledge about the input data, makes no assumptions regarding the distribution of alleles at variant sites, is robust to error, and uses all available evidence from aligned reads, without altering or discarding observed variation. We evaluate our approach using synthetic metahaplomes constructed from sets of real genes and show that up to 99% of SNPs on a haplotype can be correctly recovered from short reads that originate from a metagenomic data set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.