BackgroundCore collections provide genebank curators and plant breeders a way to reduce size of their collections and populations, while minimizing impact on genetic diversity and allele frequency. Many methods have been proposed to generate core collections, often using distance metrics to quantify the similarity of two accessions, based on genetic marker data or phenotypic traits. Core Hunter is a multi-purpose core subset selection tool that uses local search algorithms to generate subsets relying on one or more metrics, including several distance metrics and allelic richness.ResultsIn version 3 of Core Hunter (CH3) we have incorporated two new, improved methods for summarizing distances to quantify diversity or representativeness of the core collection. A comparison of CH3 and Core Hunter 2 (CH2) showed that these new metrics can be effectively optimized with less complex algorithms, as compared to those used in CH2. CH3 is more effective at maximizing the improved diversity metric than CH2, still ensures a high average and minimum distance, and is faster for large datasets. Using CH3, a simple stochastic hill-climber is able to find highly diverse core collections, and the more advanced parallel tempering algorithm further increases the quality of the core and further reduces variability across independent samples. We also evaluate the ability of CH3 to simultaneously maximize diversity, and either representativeness or allelic richness, and compare the results with those of the GDOpt and SimEli methods. CH3 can sample equally representative cores as GDOpt, which was specifically designed for this purpose, and is able to construct cores that are simultaneously more diverse, and either are more representative or have higher allelic richness, than those obtained by SimEli.ConclusionsIn version 3, Core Hunter has been updated to include two new core subset selection metrics that construct cores for representativeness or diversity, with improved performance. It combines and outperforms the strengths of other methods, as it (simultaneously) optimizes a variety of metrics. In addition, CH3 is an improvement over CH2, with the option to use genetic marker data or phenotypic traits, or both, and improved speed. Core Hunter 3 is freely available on http://www.corehunter.org.
Long-term genomic selection (GS) requires strategies that balance genetic gain with population diversity, to sustain progress for traits under selection, and to keep diversity for future breeding. In a simulation model for a recurrent selection scheme, we provide the first head-to-head comparison of two such existing strategies: genomic optimal contributions selection (GOCS), which limits realized genomic relationship among selection candidates, and weighted genomic selection (WGS), which upscales rare allele effects in GS. Compared to GS, both methods provide the same higher long-term genetic gain and a similar lower inbreeding rate, despite some inherent limitations. GOCS does not control the inbreeding rate component linked to trait selection, and, therefore, does not strike the optimal balance between genetic gain and inbreeding. This makes it less effective throughout the breeding scheme, and particularly so at the beginning, where genetic gain and diversity may not be competing. For WGS, truncation selection proved suboptimal to manage rare allele frequencies among the selection candidates. To overcome these limitations, we introduce two new set selection methods that maximize a weighted index balancing genetic gain with controlling expected heterozygosity (IND-HE) or maintaining rare alleles (IND-RA), and show that these outperform GOCS and WGS in a nearly identical way. While requiring further testing, we believe that the inherent benefits of the IND-HE and IND-RA methods will transfer from our simulation framework to many practical breeding settings, and are therefore a major step forward toward efficient long-term genomic selection.
BackgroundSampling core subsets from genetic resources while maintaining as much as possible the genetic diversity of the original collection is an important but computationally complex task for gene bank managers. The Core Hunter computer program was developed as a tool to generate such subsets based on multiple genetic measures, including both distance measures and allelic diversity indices. At first we investigate the effect of minimum (instead of the default mean) distance measures on the performance of Core Hunter. Secondly, we try to gain more insight into the performance of the original Core Hunter search algorithm through comparison with several other heuristics working with several realistic datasets of varying size and allelic composition. Finally, we propose a new algorithm (Mixed Replica search) for Core Hunter II with the aim of improving the diversity of the constructed core sets and their corresponding generation times.ResultsOur results show that the introduction of minimum distance measures leads to core sets in which all accessions are sufficiently distant from each other, which was not always obtained when optimizing mean distance alone. Comparison of the original Core Hunter algorithm, Replica Exchange Monte Carlo (REMC), with simpler heuristics shows that the simpler algorithms often give very good results but with lower runtimes than REMC. However, the performance of the simpler algorithms is slightly worse than REMC under lower sampling intensities and some heuristics clearly struggle with minimum distance measures. In comparison the new advanced Mixed Replica search algorithm (MixRep), which uses heterogeneous replicas, was able to sample core sets with equal or higher diversity scores than REMC and the simpler heuristics, often using less computation time than REMC.ConclusionThe REMC search algorithm used in the original Core Hunter computer program performs well, sometimes leading to slightly better results than some of the simpler methods, although it doesn’t always give the best results. By switching to the new Mixed Replica algorithm overall results and runtimes can be significantly improved. Finally we recommend including minimum distance measures in the objective function when looking for core sets in which all accessions are sufficiently distant from each other. Core Hunter II is freely available as an open source project at http://www.corehunter.org.
Summary This paper describes the Java Metaheuristics Search framework (JAMES, v1.1): an object‐oriented Java framework for discrete optimization using local search algorithms that exploits the generality of such metaheuristics by clearly separating search implementation and application from problem specification. A wide range of generic local searches are provided, including (stochastic) hill climbing, tabu search, variable neighbourhood search and parallel tempering. These can be applied to any user‐defined problem by plugging in a custom neighbourhood for the corresponding solution type. Using an automated analysis workflow, the performance of different search algorithms can be compared in order to select an appropriate optimization strategy. Implementations of specific components are included for subset selection, such as a predefined solution type, generic problem definition and several subset neighbourhoods used to modify the set of selected items. Additional components for other types of problems (e.g. permutation problems) are provided through an extensions module which also includes the analysis workflow. In comparison with existing Java metaheuristics frameworks that mainly focus on population‐based algorithms, JAMES has a much lower memory footprint and promotes efficient application of local searches by taking full advantage of move‐based evaluation. Releases of JAMES are deployed to the Maven Central Repository so that the framework can easily be included as a dependency in other Java applications. The project is fully open source and hosted on GitHub. More information can be found at http://www.jamesframework.org. Copyright © 2016 John Wiley & Sons, Ltd.
BackgroundOver the last decade genetic marker-based plant breeding strategies have gained increasing attention because genotyping technologies are no longer limiting. Now the challenge is to optimally use genetic markers in practical breeding schemes. For simple traits such as some disease resistances it is possible to target a fixed multi-locus allele configuration at a small number of causal or linked loci. Efficiently obtaining this genetic ideotype from a given set of parental genotypes is known as the marker-assisted gene pyramiding problem. Previous methods either imposed strong restrictions or used black box integer programming solutions, while this paper explores the power of an explicit heuristic approach that exploits the underlying genetic structure to prune the search space.ResultsGene Stacker is introduced as a novel approach to marker-assisted gene pyramiding, combining an explicit directed acyclic graph model with a pruned generation algorithm inspired by a simple exhaustive search. Both exact and heuristic pruning criteria are applied to reduce the number of generated schedules. It is shown that this approach can effectively be used to obtain good solutions for stacking problems of varying complexity. For more complex problems, the heuristics allow to obtain valuable approximations. For smaller problems, fewer heuristics can be applied, resulting in an interesting quality-runtime tradeoff. Gene Stacker is competitive with previous methods and often finds better and/or additional solutions within reasonable time, because of the powerful heuristics.ConclusionsThe proposed approach was confirmed to be feasible in combination with heuristics to cope with realistic, complex stacking problems. The inherent flexibility of this approach allows to easily address important breeding constraints so that the obtained schedules can be widely used in practice without major modifications. In addition, the ideas applied for Gene Stacker can be incorporated in and extended for a plant breeding context that e.g. also addresses complex quantitative traits or conservation of genetic background. Gene Stacker is freely available as open source software at http://genestacker.ugent.be. The website also provides documentation and examples of how to use Gene Stacker.Electronic supplementary materialThe online version of this article (doi:10.1186/s12863-014-0154-z) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.