The emergence of type III polyketide synthases (PKSs) was a pre-requisite for the conquest of land by the green lineage. To study the deep evolutionary history of this key family, we used phylogenomic synteny network and phylogenetic analyses of whole-genome data from 126 species spanning the green lineage. This study thereby combined study of genomic location and context with changes in gene sequences. We found that two major clades, CHS and LAP5/6 homologs, evolved early by a segmental duplication event prior to the divergence of Bryophytes and Tracheophytes. We propose that the macroevolution of the type III PKS superfamily is governed by whole-genome duplications and triplications. Intriguingly, the combined phylogenetic and synteny analyses in this study shed new insights into changes in the genomic location and context that are retained for a longer time scale with more recent functional divergence captured by gene sequence alterations.PKSs are present in all land plants albeit in varying copy numbers. However, PKS were not found, or only found in low copy numbers, in the Chlorophyta, and are absent in Chlorokybophyceae, Mesostigmaphyceae and Coleochaetaphyceae of the Charophyta (see Supplementary Text). By contrast, type III PKS were detected in Penium margaritaceum (36).
Synteny network analysis detects clade-specific and reaction type-specific clustersTo study the diversification of the type III PKS superfamily we followed a synteny network approach (32). Whole genomes of 126 species were compared in a pairwise manner, followed by robust block detection of regions containing type III PKS genes and network analysis to detect syntenic clusters within the network. The resultant network contained 706 vertices corresponding to syntenic regions containing single or multiple type III PKS genes of which 166 vertices corresponded to regions with tandem-duplicated genes from a total of 105 species ( Supplementary Table S2).Tandem-duplicated genes may play important roles in providing genetic redundancy, gene dosage balance, genetic robustness, and to provide an additional means for divergence in transcriptional regulation and protein sequence (37-40). The highest number of tandem-duplicated PKS genes in one syntenic region was 23 (Arachis duranensis, containing mainly 'R-4-A'-type PKS sequences, Fig. 1 and Fig. 2).Arachis ipaensis (21 genes) and Vitis vinifera (20 genes) had the second-and thirdhighest numbers of tandem-duplicated genes (also containing mostly 'R-4-A'-type sequences), respectively.