Whole genome duplications (WGDs) followed by massive gene loss occurred in the evolutionary history of many groups. WGDs are usually inferred from the age distribution of paralogs (Ks-based methods) or from gene collinearity data (synteny). However, Ks-based methods are restricted to detect the recent WGDs due to saturation effects and the difficulty to date old duplicates, and synteny is difficult to reconstruct for distantly related species. Recently, Jiao et al. (Jiao Y, Wickett N, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, Tomsho LP, Hu Y, Liang H, Soltis PS, et al. 2011. Ancestral polyploidy in seed plants and angiosperms. Nature 473:97-100) introduced an empirical method that aims to detect a peak in duplication ages among nodes selected from a previous phylogenetic analysis. In this context, we present here two rigorous methods based on data from multiple gene families and on a new probabilistic model. Our model assumes that all gene lineages are instantaneously duplicated at the WGD event with a possible almost-immediate loss of some extra copies. Our reconciliation method relies on aligned molecular sequences, whereas our gene count method relies only on gene count data across species. We show, using extensive simulations, that both methods have a good detection power. Surprisingly, the gene count method enjoys no loss of power compared with the reconciliation method, despite the fact that sequence information is not used. We finally illustrate the performance of our methods on a benchmark yeast data set. Both methods are able to detect the well-known WGD in the Saccharomyces cerevisiae clade and agree on a small retention rate at the WGD, as established by synteny-based methods.
Genomic selection is focused on prediction of breeding values of selection candidates by means of high density of markers. It relies on the assumption that all quantitative trait loci (QTLs) tend to be in strong linkage disequilibrium (LD) with at least one marker. In this context, we present theoretical results regarding the accuracy of genomic selection, i.e., the correlation between predicted and true breeding values. Typically, for individuals (so-called test individuals), breeding values are predicted by means of markers, using marker effects estimated by fitting a ridge regression model to a set of training individuals. We present a theoretical expression for the accuracy; this expression is suitable for any configurations of LD between QTLs and markers. We also introduce a new accuracy proxy that is free of the QTL parameters and easily computable; it outperforms the proxies suggested in the literature, in particular, those based on an estimated effective number of independent loci (Me). The theoretical formula, the new proxy, and existing proxies were compared for simulated data, and the results point to the validity of our approach. The calculations were also illustrated on a new perennial ryegrass set (367 individuals) genotyped for 24,957 single nucleotide polymorphisms (SNPs). In this case, most of the proxies studied yielded similar results because of the lack of markers for coverage of the entire genome (2.7 Gb).
We consider the likelihood ratio test (LRT) process related to the test of the absence of QTL (a QTL denotes a quantitative trait locus, i.e. a gene with quantitative effect on a trait) on the interval [0, T ] representing a chromosome. The observation is the trait and the composition of the genome at some locations called "markers". We give the asymptotic distribution of this LRT process under the null hypothesis that there is no QTL on [0, T ] and under local alternatives with a QTL at t on [0, T ]. We show that the LRT is asymptotically the square of some Gaussian process. We give a description of this process as an " non-linear interpolated and normalized process ". We propose a simple method to calculate the maximum of the LRT process using only statistics on markers and their ratio. This gives a new method to calculate thresholds for QTL detection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.