Eucalyptus polybractea is a small, multi-stemmed tree, which is widely cultivated in Australia for the production of Eucalyptus oil. We report the hybrid assembly of the E. polybractea genome utilizing both short- and long-read technology. We generated 44 Gb of Illumina HiSeq short reads and 8 Gb of Nanopore long reads, representing approximately 83 and 15 times genome coverage, respectively. The hybrid-assembled genome, after polishing, contained 24,864 scaffolds with an accumulated length of 523 Mb (N50 = 40.3 kb; BUSCO-calculated genome completeness of 94.3%). The genome contained 35,385 predicted protein-coding genes detected by combining homology-based and de novo approaches. We have provided the first assembled genome based on hybrid sequences from the highly diverse Eucalyptus subgenus Symphyomyrtus, and revealed the value of including long-reads from Nanopore technology for enhancing the contiguity of the assembled genome, as well as for improving its completeness. We anticipate that the E. polybractea genome will be an invaluable resource supporting a range of studies in genetics, population genomics and evolution of related species in Eucalyptus.
BackgroundPartitioning involves estimating independent models of molecular evolution for different subsets of sites in a sequence alignment, and has been shown to improve phylogenetic inference. Current methods for estimating best-fit partitioning schemes, however, are only computationally feasible with datasets of fewer than 100 loci. This is a problem because datasets with thousands of loci are increasingly common in phylogenetics.MethodsWe develop two novel methods for estimating best-fit partitioning schemes on large phylogenomic datasets: strict and relaxed hierarchical clustering. These methods use information from the underlying data to cluster together similar subsets of sites in an alignment, and build on clustering approaches that have been proposed elsewhere.ResultsWe compare the performance of our methods to each other, and to existing methods for selecting partitioning schemes. We demonstrate that while strict hierarchical clustering has the best computational efficiency on very large datasets, relaxed hierarchical clustering provides scalable efficiency and returns dramatically better partitioning schemes as assessed by common criteria such as AICc and BIC scores.ConclusionsThese two methods provide the best current approaches to inferring partitioning schemes for very large datasets. We provide free open-source implementations of the methods in the PartitionFinder software. We hope that the use of these methods will help to improve the inferences made from large phylogenomic datasets.
Summary
Genome‐wide association studies (GWAS) have great promise for identifying the loci that contribute to adaptive variation, but the complex genetic architecture of many quantitative traits presents a substantial challenge.
We measured 14 morphological and physiological traits and identified single nucleotide polymorphism (SNP)‐phenotype associations in a Populus trichocarpa population distributed from California, USA to British Columbia, Canada. We used whole‐genome resequencing data of 882 trees with more than 6.78 million SNPs, coupled with multitrait association to detect polymorphisms with potentially pleiotropic effects. Candidate genes were validated with functional data.
Broad‐sense heritability (H2) ranged from 0.30 to 0.56 for morphological traits and 0.08 to 0.36 for physiological traits. In total, 4 and 20 gene models were detected using the single‐trait and multitrait association methods, respectively. Several of these associations were corroborated by additional lines of evidence, including co‐expression networks, metabolite analyses, and direct confirmation of gene function through RNAi.
Multitrait association identified many more significant associations than single‐trait association, potentially revealing pleiotropic effects of individual genes. This approach can be particularly useful for challenging physiological traits such as water‐use efficiency or complex traits such as leaf morphology, for which we were able to identify credible candidate genes by combining multitrait association with gene co‐expression and co‐methylation data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.