Motivation: Recent advances in high-throughput sequencing (HTS) have made it possible to monitor genomes in great detail. New experiments not only use HTS to measure genomic features at one time point but also monitor them changing over time with the aim of identifying significant changes in their abundance. In population genetics, for example, allele frequencies are monitored over time to detect significant frequency changes that indicate selection pressures. Previous attempts at analyzing data from HTS experiments have been limited as they could not simultaneously include data at intermediate time points, replicate experiments and sources of uncertainty specific to HTS such as sequencing depth.Results: We present the beta-binomial Gaussian process model for ranking features with significant non-random variation in abundance over time. The features are assumed to represent proportions, such as proportion of an alternative allele in a population. We use the beta-binomial model to capture the uncertainty arising from finite sequencing depth and combine it with a Gaussian process model over the time series. In simulations that mimic the features of experimental evolution data, the proposed method clearly outperforms classical testing in average precision of finding selected alleles. We also present simulations exploring different experimental design choices and results on real data from Drosophila experimental evolution experiment in temperature adaptation.Availability and implementation: R software implementing the test is available at https://github.com/handetopa/BBGP.Contact: hande.topa@aalto.fi, agnes.jonas@vetmeduni.ac.at, carolin.kosiol@vetmeduni.ac.at, antti.honkela@hiit.fiSupplementary information: Supplementary data are available at Bioinformatics online.
The effective population size (Nnormale) is a major factor determining allele frequency changes in natural and experimental populations. Temporal methods provide a powerful and simple approach to estimate short-term Nnormale. They use allele frequency shifts between temporal samples to calculate the standardized variance, which is directly related to Nnormale. Here we focus on experimental evolution studies that often rely on repeated sequencing of samples in pools (Pool-seq). Pool-seq is cost-effective and often outperforms individual-based sequencing in estimating allele frequencies, but it is associated with atypical sampling properties: Additional to sampling individuals, sequencing DNA in pools leads to a second round of sampling, which increases the variance of allele frequency estimates. We propose a new estimator of Nnormale, which relies on allele frequency changes in temporal data and corrects for the variance in both sampling steps. In simulations, we obtain accurate Nnormale estimates, as long as the drift variance is not too small compared to the sampling and sequencing variance. In addition to genome-wide Nnormale estimates, we extend our method using a recursive partitioning approach to estimate Nnormale locally along the chromosome. Since the type I error is controlled, our method permits the identification of genomic regions that differ significantly in their Nnormale estimates. We present an application to Pool-seq data from experimental evolution with Drosophila and provide recommendations for whole-genome data. The estimator is computationally efficient and available as an R package at https://github.com/ThomasTaus/Nest.
The effective population size (N e ) is a major factor determining allele frequency changes in natural and experimental populations. Temporal methods provide a powerful and simple approach to estimate short-term N e : They use allele frequency shifts between temporal samples to calculate the standardized variance, which is directly related to N e : Here we focus on experimental evolution studies that often rely on repeated sequencing of samples in pools (Pool-seq). Pool-seq is cost-effective and often outperforms individual-based sequencing in estimating allele frequencies, but it is associated with atypical sampling properties: Additional to sampling individuals, sequencing DNA in pools leads to a second round of sampling, which increases the variance of allele frequency estimates. We propose a new estimator of N e ; which relies on allele frequency changes in temporal data and corrects for the variance in both sampling steps. In simulations, we obtain accurate N e estimates, as long as the drift variance is not too small compared to the sampling and sequencing variance. In addition to genome-wide N e estimates, we extend our method using a recursive partitioning approach to estimate N e locally along the chromosome. Since the type I error is controlled, our method permits the identification of genomic regions that differ significantly in their N e estimates. We present an application to Pool-seq data from experimental evolution with Drosophila and provide recommendations for whole-genome data. The estimator is computationally efficient and available as an R package at https://github.com/ThomasTaus/Nest. KEYWORDS effective population size; genetic drift; Pool-seq; experimental evolution D URING experimental evolution studies, populations are maintained under specific laboratory conditions (Kawecki et al. 2012;Long et al. 2015;Schlötterer et al. 2015). In sexually reproducing organisms, the census population size is typically kept fixed at fairly low numbers, rarely exceeding 2000 individuals. With such small population sizes, genetic drift causes stochastic fluctuations in allele frequencies. Under neutrality, the level of random frequency changes is determined by the effective population size (N e ) (Wright 1931). Furthermore, the efficacy of selection is influenced by N e : For weakly selected alleles, the probability of fixation is directly proportional to the product of N e and the intensity of selection (Fisher 1930;Kimura 1964). As changes in allele frequency are greatly affected by the population size, it is fundamental to estimate N e accurately to understand molecular variation in experimental evolution studies. Krimbas and Tsakas (1971) estimated N e using the standardized variance of allele frequency (F, see also Falconer and Mackay 1996) from longitudinal samples in natural populations of olive flies. As F was calculated from these samples, they accounted for the sampling variance that also contributed to the true allele frequency variance. This approach was further improved and used by sev...
Abstract-Generating ensembles from multiple individual classifiers is a usual appraoch to raise the accuracy of the decision. For decision majority voting is a popular rule. In this paper, we generalize classic majority voting by letting a further constraint to decide whether a correct or false decision is made if k correct votes is present among the total n ones. This generalization is motivated by object detection problems, where the members of the ensemble are image processing algorithms giving their votes as pixels in the image domain. The shape of the desired object define a geometric constraint the votes should obey to be able to decide together. Namely, the votes in this scenarion should fall inside a region matching the shape of the object. We give several theoretical result in this new model for both dependent/indipendent classifiers, whose individual accuracies may also differ. As a real world example we present our ensemble-based system developed for the detection of the optic disc in retinal images. For this problem experimental results are shown on how our model is capable to characterize such a system and how the model can give a helping hand on the further improvability of the system, as well.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.