Imputation is one of the key steps in the preprocessing and quality control protocol of any genetic study. Most imputation algorithms were originally developed for the use in human genetics and thus are optimized for a high level of genetic diversity. Different versions of BEAGLE were evaluated on genetic datasets of doubled haploids of two European maize landraces, a commercial breeding line and a diversity panel in chicken, respectively, with different levels of genetic diversity and structure which can be taken into account in BEAGLE by parameter tuning. Especially for phasing BEAGLE 5.0 outperformed the newest version (5.1) which in turn also lead to improved imputation. Earlier versions were far more dependent on the adaption of parameters in all our tests. For all versions, the parameter ne (effective population size) had a major effect on the error rate for imputation of ungenotyped markers, reducing error rates by up to 98.5%. Further improvement was obtained by tuning of the parameters affecting the structure of the haplotype cluster that is used to initialize the underlying Hidden Markov Model of BEAGLE. The number of markers with extremely high error rates for the maize datasets were more than halved by the use of a flint reference genome (F7, PE0075 etc.) instead of the commonly used B73. On average, error rates for imputation of ungenotyped markers were reduced by 8.5% by excluding genetically distant individuals from the reference panel for the chicken diversity panel. To optimize imputation accuracy one has to find a balance between representing as much of the genetic diversity as possible while avoiding the introduction of noise by including genetically distant individuals.
The R-package MoBPS provides a computationally efficient and flexible framework to simulate complex breeding programs and compare their economic and genetic impact. Simulations are performed on the base of individuals. MoBPS utilizes a highly efficient implementation with bit-wise data storage and matrix multiplications from the associated R-package miraculix allowing to handle large scale populations. Individual haplotypes are not stored but instead automatically derived based on points of recombination and mutations. The modular structure of MoBPS allows to combine rather coarse simulations, as needed to generate founder populations, with a very detailed modeling of todays’ complex breeding programs, making use of all available biotechnologies. MoBPS provides pre-implemented functions for common breeding practices such as optimum genetic contributions and single-step GBLUP but also allows the user to replace certain steps with personalized and/or self-written solutions.
Single nucleotide polymorphisms (SNPs), genotyped with arrays, have become a widely used marker type in population genetic analyses over the last 10 years. However, compared to whole genome re-sequencing data, arrays are known to lack a substantial proportion of globally rare variants and tend to be biased towards variants present in populations involved in the development process of the respective array. This affects population genetic estimators and is known as SNP ascertainment bias. We investigated factors contributing to ascertainment bias in array development by redesigning the Axiom™ Genome-Wide Chicken Array in silico and evaluating changes in allele frequency spectra and heterozygosity estimates in a stepwise manner. A sequential reduction of rare alleles during the development process was shown. This was mainly caused by the identification of SNPs in a limited set of populations and a within-population selection of common SNPs when aiming for equidistant spacing. These effects were shown to be less severe with a larger discovery panel. Additionally, a generally massive overestimation of expected heterozygosity for the ascertained SNP sets was shown. This overestimation was 24% higher for populations involved in the discovery process than not involved populations in case of the original array. The same was observed after the SNP discovery step in the redesign. However, an unequal contribution of populations during the SNP selection can mask this effect but also adds uncertainty. Finally, we make suggestions for the design of specialized arrays for large scale projects where whole genome re-sequencing techniques are still too expensive.
The concept of haplotype blocks has been shown to be useful in genetics. Fields of application range from the detection of regions under positive selection to statistical methods that make use of dimension reduction...
Imputation is one of the key steps in the preprocessing and quality control protocol of any genetic study. Most imputation algorithms were originally developed for the use in human genetics and thus are optimized for a high level of genetic diversity. Different versions of BEAGLE were evaluated on genetic datasets of doubled haploids of two European maize landraces, a commercial breeding line and a diversity panel in chicken, respectively, with different levels of genetic diversity and structure which can be taken into account in BEAGLE by parameter tuning. Especially for phasing BEAGLE 5.0 outperformed the newest version (5.1) which in turn also lead to improved imputation. Earlier versions were far more dependent on the adaption of parameters in all our tests. For all versions, the parameter ne (effective population size) had a major effect on the error rate for imputation of ungenotyped markers, reducing error rates by up to 98.5%. Further improvement was obtained by tuning of the parameters affecting the structure of the haplotype cluster that is used to initialize the underlying Hidden Markov Model of BEAGLE. The number of markers with extremely high error rates for the maize datasets were more than halved by the use of a flint reference genome (F7, PE0075 etc.) instead of the commonly used B73. On average, error rates for imputation of ungenotyped markers were reduced by 8.5% by excluding genetically distant individuals from the reference panel for the chicken diversity panel. To optimize imputation accuracy one has to find a balance between representing as much of the genetic diversity as possible while avoiding the introduction of noise by including genetically distant individuals. KEYWORDSimputation BEAGLE reference panel reference genome 1 23 special tools for both cases have been developed. As fully homozy-24 gous lines are commonly present in crops, the software TASSEL 25 (Bradbury et al. 2007) was developed to work well on this data 26 structure (Swarts et al. 2014). Since pedigrees in animal breeding 27 1 INVESTIGATIONS can be much denser than in human populations (both w.r.t. depth 28 and family size), tools like FImpute (Sargolzaei et al. 2014) and 29 AlphaImpute (Hickey et al. 2011) have been developed to fully 30 utilize this information. 31In the imputation process all those methods use the fact that physi-32 cally close markers are likely inherited together, resulting in non-33 random associations of alleles. These methods thereby rely on the 34 knowledge of the physical position or at least the order of markers 35 for modeling linkage and thus the resulting linkage disequilibrium 36 (LD). In contrast, the software LinkImpute (Money et al. 2015) ac-37 counts for LD between pairs of markers and not their physical 38 positions. This can be particularly relevant for species in which no 39 reference sequence is available or whose genomes are known for a 40 high amount of translocations and inversions. 41 In contrast to other methods using a HMM, the Markov chain in 42 BEAGLE is not initialized by ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.