The promise of association genetics to identify genes or genomic regions controlling complex traits has generated a flurry of interest. Such phenotype-genotype associations could be useful to accelerate tree breeding cycles, increase precision and selection intensity for late expressing, low heritability traits. However, the prospects of association genetics in highly heterozygous undomesticated forest trees can be severely impacted by the presence of cryptic population and pedigree structure. To investigate how to better account for this, we compared the GLM and five combinations of the Unified Mixed Model (UMM) on data of a low-density genome-wide association study for growth and wood property traits carried out in a Eucalyptus globulus population (n = 303) with 7,680 Diversity Array Technology (DArT) markers. Model comparisons were based on the degree of deviation from the uniform distribution and estimates of the mean square differences between the observed and expected p-values of all significant marker-trait associations detected. Our analysis revealed the presence of population and family structure. There was not a single best model for all traits. Striking differences in detection power and accuracy were observed among the different models especially when population structure was not accounted for. The UMM method was the best and produced superior results when compared to GLM for all traits. Following stringent correction for false discoveries, 18 marker-trait associations were detected, 16 for tree diameter growth and two for lignin monomer composition (S∶G ratio), a key wood property trait. The two DArT markers associated with S∶G ratio on chromosome 10, physically map within 1 Mbp of the ferulate 5-hydroxylase (F5H) gene, providing a putative independent validation of this marker-trait association. This study details the merit of collectively integrate population structure and relatedness in association analyses in undomesticated, highly heterozygous forest trees, and provides additional insights into the nature of complex quantitative traits in Eucalyptus.
Restriction site-associated DNA sequencing (RADseq) and its derived protocols, such as double digest RADseq (ddRADseq), offer a flexible and highly cost-effective strategy for efficient plant genome sampling. This has become one of the most popular genotyping approaches for breeding, conservation, and evolution studies in model and non-model plant species. However, universal protocols do not always adapt well to non-model species. Herein, this study reports the development of an optimized and detailed ddRADseq protocol in Eucalyptus dunnii, a non-model species, which combines different aspects of published methodologies. The initial protocol was established using only two samples by selecting the best combination of enzymes and through optimal size selection and simplifying lab procedures. Both single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) were determined with high accuracy after applying stringent bioinformatics settings and quality filters, with and without a reference genome. To scale it up to 24 samples, we added barcoded adapters. We also applied automatic size selection, and therefore obtained an optimal number of loci, the expected SNP locus density, and genome-wide distribution. Reliability and cross-sequencing platform compatibility were verified through dissimilarity coefficients of 0.05 between replicates. To our knowledge, this optimized ddRADseq protocol will allow users to go from the DNA sample to genotyping data in a highly accessible and reproducible way.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.