The economically important Melaleuca alternifolia (tea tree) is the source of a terpene-rich essential oil with therapeutic and cosmetic uses around the world. Tea tree has been cultivated and bred in Australia since the 1990s. It has been extensively studied for the genetics and biochemistry of terpene biosynthesis. Here, we report a high quality de novo genome assembly using Pacific Biosciences and Illumina sequencing. The genome was assembled into 3128 scaffolds with a total length of 362 Mb (N50 = 1.9 Mb), with significantly higher contiguity than a previous assembly (N50 = 8.7 Kb). Using a homology-based, RNA-seq evidence-based and ab initio prediction approach, 37,226 protein-coding genes were predicted. Genome assembly and annotation exhibited high completeness scores of 98.1% and 89.4%, respectively. Sequence contiguity was sufficient to reveal extensive gene order conservation and chromosomal rearrangements in alignments with Eucalyptus grandis and Corymbia citriodora genomes. This new genome advances currently available resources to investigate the genome structure and gene family evolution of M. alternifolia. It will enable further comparative genomic studies in Myrtaceae to elucidate the genetic foundations of economically valuable traits in this crop.
Terpene synthases (TPS) are responsible for the terminal biosynthetic step of terpenoid production. They are encoded by a highly diverse gene family believed to evolve by tandem duplication in response to adaptive pressures. Taxa in the Myrtaceae family are renowned for their diversity of terpenoid-rich essential oils, and among them, the tribe Eucalypteae has the largest TPS gene family found in any plant (> 100 TPS). In this study, comparative analysis of Melaleuca alternifolia (tea tree), from the related tribe Melaleuceae, revealed some Myrtaceae have smaller TPS families, as a total of 58 putatively functional full-length TPS genes, and 21 pseudogenes were identified by manual annotation of a newly released long-read assembly of the genome. The TPS-a and TPS-b2 subfamilies that synthesise secondary compounds often mediating plant-environment interactions were more diminutive than those in eucalypts, probably reflecting key differences in the evolutionary histories of the two lineages. Of the putatively functional TPS-b1, 13 clustered into a region of around 400 kb on one scaffold. The organisation of these TPS suggested that tandem duplication was instrumental in the evolution and diversity of terpene chemistry in Melaleuca. Four TPS-b1 likely to catalyse the synthesis of the three monoterpenoid components that are used to classify tea tree chemotypes were encoded within a single small region of 87 kb in the larger cluster of TPS-b1, raising the possibility that coregulation and linkage may lead to their behaviour as a single locus, providing an explanation for the categorical inheritance of complex multiple-component chemotypes in the taxon.
This protocol provides additional information to be read together with the publication about the genome assembly of Melaleuca alternifolia (tea tree). This is an extension to the methods listed in the manuscript, especially regarding the computational methods and specific commands that were used for genome assembly and annotation, as well as various quality control and filtering steps.
This protocol was created to be read in conjunction with the publication about the manual annotation of the terpene synthase (TPS) gene family in Melaleuca alternifolia (tea tree). It provides additional information regarding the computational methods mentioned in the manuscript, especially concerning specific commands for reproducibility of the methodology. When using and citing this protocol, please also refer to the original article publication as mentioned in the metadata section of this protocol.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.