Likelihood-based parentage inference depends on the distribution of a likelihood-ratio statistic, which, in most cases of interest, cannot be exactly determined, but only approximated by Monte Carlo simulation. We provide importance-sampling algorithms for efficiently approximating very small tail probabilities in the distribution of the likelihood-ratio statistic. These importance-sampling methods allow the estimation of small false-positive rates and hence permit likelihood-based inference of parentage in large studies involving a great number of potential parents and many potential offspring. We investigate the performance of these importance-sampling algorithms in the context of parentage inference using single-nucleotide polymorphism (SNP) data and find that they may accelerate the computation of tail probabilities .1 millionfold. We subsequently use the importance-sampling algorithms to calculate the power available with SNPs for largescale parentage studies, paying particular attention to the effect of genotyping errors and the occurrence of related individuals among the members of the putative mother-father-offspring trios. These simulations show that 60-100 SNPs may allow accurate pedigree reconstruction, even in situations involving thousands of potential mothers, fathers, and offspring. In addition, we compare the power of exclusion-based parentage inference to that of the likelihood-based method. Likelihood-based inference is much more powerful under many conditions; exclusion-based inference would require 40% more SNP loci to achieve the same accuracy as the likelihood-based approach in one common scenario. Our results demonstrate that SNPs are a powerful tool for parentage inference in large managed and/or natural populations.
Estimating the accuracy of genetic stock identification (GSI) that can be expected given a previously collected baseline requires simulation. The conventional method involves repeatedly simulating mixtures by resampling from the baseline, simulating new baselines by resampling from the baseline, and analyzing the simulated mixtures with the simulated baselines. We show that this overestimates the predicted accuracy of GSI. The bias is profound for closely related populations and increases as more genetic data (loci and (or) alleles) are added to the analysis. We develop a new method based on leave-one-out cross validation and show that it yields essentially unbiased estimates of GSI accuracy. Applying both our method and the conventional method to a coastwide baseline of 166 Chinook salmon (Oncorhynchus tshawytscha) populations shows that the conventional method provides severely biased predictions of accuracy for some individual populations. The bias for reporting units (aggregations of closely related populations) is moderate, but still present.Résumé : L'estimation de la précision de l'identification du stock génétique (« GSI ») qu'on peut espérer, étant donné une banque de données de base récoltée antérieurement, nécessite des simulations. La méthode courante comprend des simulations répétées de mélanges par ré-échantillonnage de la banque de données de base, des simulations de nouvelles banques de données de base en ré-échantillonnant la banque de données et l'analyse des mélanges ainsi simulés à l'aide des banques de données de base simulées. Nous montrons que cette méthode surestime la précision prédite de GSI. L'erreur est importante dans les populations fortement apparentées et elle augmente à mesure que de nouvelles données génétiques (locus et (ou) allèles) sont ajoutées à l'analyse. Nous mettons au point une nouvelle méthode basée sur une validation croisée de type « leave-one-out » (avec retrait d'un élément) et nous montrons qu'elle produit essentiellement des estimations non erronées de la précision de GSI. L'application de notre méthode et de la méthode courante à une banque de données de base provenant de 166 populations de saumons chinook (Oncorhynchus tshawytscha) réparties sur toute la côte montre que la méthode courante fournit des prédictions de la précision qui sont grandement faussées pour certaines populations individuelles. L'erreur dans le cas des unités d'évaluation (des rassemblements de populations fortement apparentées) est peu importante, mais réelle.[Traduit par la Rédaction]
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.