Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
Pathogen genomics is a powerful tool for tracking infectious disease transmission. In malaria, identity-by-descent (IBD) is used to assess the genetic relatedness between parasites and has been used to study transmission and importation. In theory, IBD can be used to distinguish genealogical relationships to reconstruct transmission history or identify parasites for quantitative-trait-locus experiments. MalKinID (Malaria Kinship Identifier) is a new classification model designed to identify genealogical relationships among malaria parasites based on genome-wide IBD proportions and IBD segment distributions. MalKinID was calibrated to the genomic data from three laboratory-based genetic crosses (yielding 440 parent-child [PC] and 9060 full-sibling [FS] comparisons). MalKinID identified lab generated F1 progeny with >80% sensitivity and showed that 0.39 (95% CI 0.28, 0.49) of the second-generation progeny of a NF54 and NHP4026 cross were F1s and 0.56 (0.45, 0.67) were backcrosses of an F1 with the parental NF54 strain. In simulated outcrossed importations, MalKinID reconstructs genealogy history with high precision and sensitivity, with F1-scores exceeding 0.84. However, when importation involves inbreeding, such as during serial co-transmission, the precision and sensitivity of MalKinID declined, with F1-scores (the harmonic mean of precision and sensitivity) of 0.76 (0.56, 0.92) and 0.23 (0.0, 0.4) for PC and FS and <0.05 for second-degree and third-degree relatives. Disentangling inbred relationships required adapting MalKinID to perform multi-sample comparisons. Genealogical inference is most powered when 1) outcrossing is the norm or 2) multi-sample comparisons based on a predefined pedigree are used. MalKinID lays the foundations for using IBD to track parasite transmission history and for separating progeny for quantitative-trait-locus experiments.
Pathogen genomics is a powerful tool for tracking infectious disease transmission. In malaria, identity-by-descent (IBD) is used to assess the genetic relatedness between parasites and has been used to study transmission and importation. In theory, IBD can be used to distinguish genealogical relationships to reconstruct transmission history or identify parasites for quantitative-trait-locus experiments. MalKinID (Malaria Kinship Identifier) is a new classification model designed to identify genealogical relationships among malaria parasites based on genome-wide IBD proportions and IBD segment distributions. MalKinID was calibrated to the genomic data from three laboratory-based genetic crosses (yielding 440 parent-child [PC] and 9060 full-sibling [FS] comparisons). MalKinID identified lab generated F1 progeny with >80% sensitivity and showed that 0.39 (95% CI 0.28, 0.49) of the second-generation progeny of a NF54 and NHP4026 cross were F1s and 0.56 (0.45, 0.67) were backcrosses of an F1 with the parental NF54 strain. In simulated outcrossed importations, MalKinID reconstructs genealogy history with high precision and sensitivity, with F1-scores exceeding 0.84. However, when importation involves inbreeding, such as during serial co-transmission, the precision and sensitivity of MalKinID declined, with F1-scores (the harmonic mean of precision and sensitivity) of 0.76 (0.56, 0.92) and 0.23 (0.0, 0.4) for PC and FS and <0.05 for second-degree and third-degree relatives. Disentangling inbred relationships required adapting MalKinID to perform multi-sample comparisons. Genealogical inference is most powered when 1) outcrossing is the norm or 2) multi-sample comparisons based on a predefined pedigree are used. MalKinID lays the foundations for using IBD to track parasite transmission history and for separating progeny for quantitative-trait-locus experiments.
Spatial patterns in genetic diversity are shaped by individuals dispersing from their parents and larger-scale population movements. It has long been appreciated that these patterns of movement shape the underlying genealogies along the genome leading to geographic patterns of isolation by distance in contemporary population genetic data. However, extracting the enormous amount of information contained in genealogies along recombining sequences has, up till recently, not been computational feasible. Here we capitalize on important recent advances in gene-genealogy reconstruction and develop methods to use thousands of trees to estimate time-varying per-generation dispersal rates and to locate the genetic ancestors of a sample back through time. We take a likelihood approach in continuous space using a simple approximate model (branching Brownian motion) as our prior distribution of spatial genealogies. After testing our method with simulations we apply it to the 1001 Genomes dataset of over one thousand Arabidopsis thaliana genomes sampled across a wide geographic extent. We detect a very high dispersal rate in the recent past, especially longitudinally, and use inferred ancestor locations to visualize many examples of recent long-distance dispersal and admixture. We also use inferred ancestor locations to identify the origin and ancestry of the North American expansion and to depict alternative geographic ancestries stemming from multiple glacial refugia. Our method highlights the huge amount of information about past dispersal events and population movements contained in genome-wide genealogies.
Scalable methods for estimating marginal coalescent trees across the genome present new opportunities for studying evolution and have generated considerable excitement, with new methods extending scalability to thousands of samples. Benchmarking of the available methods has revealed general tradeoffs between accuracy and scalability, but performance in downstream applications has not always been easily predictable from general performance measures, suggesting that specific features of the ARG may be important for specific downstream applications of estimated ARGs. To exemplify this point, we benchmark ARG estimation methods with respect to a specific set of methods for estimating the historical time course of a population-mean polygenic score (PGS) using the marginal coalescent trees encoded by the ancestral recombination graph (ARG). Here we examine the performance in simulation of six ARG estimationmethods: ARGweaver, RENT+, Relate, tsinfer+tsdate, ARG-Needle/ASMC-clust, andSINGER, using their estimated coalescent trees and examining bias, mean squared error (MSE), confidence interval coverage, and Type I and II error rates of the downstream methods. Although it does not scale to the sample sizes attainable by other new methods,SINGERproduced the most accurate estimated PGS histories in many instances, even whenRelate, tsinfer+tsdate, andARG-Needle/ASMC-clustused samples ten times as large as those used by SINGER. In general, the best choice of method depends on the number of samples available and the historical time period of interest. In particular, the unprecedented sample sizes allowed byRelate, tsinfer+tsdate, andARG-Needle/ASMC-clustare of greatest importance when the recent past is of interest—further back in time, most of the tree has coalesced, and differences in contemporary sample size are less salient.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.