Phylogenomics, the use of large-scale data matrices in phylogenetic analyses, has been viewed as the ultimate solution to the problem of resolving difficult nodes in the tree of life. However, it has become clear that analyses of these large genomic data sets can also result in conflicting estimates of phylogeny. Here, we use the early divergences in Neoaves, the largest clade of extant birds, as a "model system" to understand the basis for incongruence among phylogenomic trees. We were motivated by the observation that trees from two recent avian phylogenomic studies exhibit conflicts. Those studies used different strategies: 1) collecting many characters [$\sim$ 42 mega base pairs (Mbp) of sequence data] from 48 birds, sometimes including only one taxon for each major clade; and 2) collecting fewer characters ($\sim$ 0.4 Mbp) from 198 birds, selected to subdivide long branches. However, the studies also used different data types: the taxon-poor data matrix comprised 68% non-coding sequences whereas coding exons dominated the taxon-rich data matrix. This difference raises the question of whether the primary reason for incongruence is the number of sites, the number of taxa, or the data type. To test among these alternative hypotheses we assembled a novel, large-scale data matrix comprising 90% non-coding sequences from 235 bird species. Although increased taxon sampling appeared to have a positive impact on phylogenetic analyses the most important variable was data type. Indeed, by analyzing different subsets of the taxa in our data matrix we found that increased taxon sampling actually resulted in increased congruence with the tree from the previous taxon-poor study (which had a majority of non-coding data) instead of the taxon-rich study (which largely used coding data). We suggest that the observed differences in the estimates of topology for these studies reflect data-type effects due to violations of the models used in phylogenetic analyses, some of which may be difficult to detect. If incongruence among trees estimated using phylogenomic methods largely reflects problems with model fit developing more "biologically-realistic" models is likely to be critical for efforts to reconstruct the tree of life. [Birds; coding exons; GTR model; model fit; Neoaves; non-coding DNA; phylogenomics; taxon sampling.].
Phylogenomics, the use of large datasets to examine phylogeny, has revolutionized the study of evolutionary relationships. However, genome-scale data have not been able to resolve all relationships in the tree of life; this could reflect, at least in part, the poor-fit of the models used to analyze heterogeneous datasets. Some of the heterogeneity may reflect the different patterns of selection on proteins based on their structures. To test that hypothesis, we developed a pipeline to divide phylogenomic protein datasets into subsets based on secondary structure and relative solvent accessibility. We then tested whether amino acids in different structural environments had distinct signals for the topology of the deepest branches in the metazoan tree. We focused on a dataset that appeared to have a mixture of signals and we found that the most striking difference in phylogenetic signal reflected relative solvent accessibility. Analyses of exposed sites (residues located on the surface of proteins) yielded a tree that placed ctenophores sister to all other animals whereas sites buried inside proteins yielded a tree with a sponge+ctenophore clade. These differences in phylogenetic signal were not ameliorated when we conducted analyses using a set of maximum-likelihood profile mixture models. These models are very similar to the Bayesian CAT model, which has been used in many analyses of deep metazoan phylogeny. In contrast, analyses conducted after recoding amino acids to limit the impact of deviations from compositional stationarity increased the congruence in the estimates of phylogeny for exposed and buried sites; after recoding amino acid trees estimated using the exposed and buried site both supported placement of ctenophores sister to all other animals. Although the central conclusion of our analyses is that sites in different structural environments yield distinct trees when analyzed using models of protein evolution, our amino acid recoding analyses also have implications for metazoan evolution. Specifically, our results add to the evidence that ctenophores are the sister group of all other animals and they further suggest that the placozoa+cnidaria clade found in some other studies deserves more attention. Taken as a whole, these results provide striking evidence that it is necessary to achieve a better understanding of the constraints due to protein structure to improve phylogenetic estimation.
Background: Previous phylogenetic studies that include the four recognized species of Gallus have resulted in a number of distinct topologies, with little agreement. Several factors could lead to the failure to converge on a consistent topology, including introgression, incomplete lineage sorting, different data types, or insufficient data. Methods:We generated three novel whole genome assemblies for Gallus species, which we combined with data from the published genomes of Gallus gallus and Bambusicola thoracicus (a member of the sister genus to Gallus). To determine why previous studies have failed to converge on a single topology, we extracted large numbers of orthologous exons, introns, ultra-conserved elements, and conserved non-exonic elements from the genome assemblies. This provided more than 32 million base pairs of data that we used for concatenated maximum likelihood and multispecies coalescent analyses of Gallus.Results: All of our analyses, regardless of data type, yielded a single, well-supported topology. We found some evidence for ancient introgression involving specific Gallus lineages as well as modest data type effects that had an impact on support and branch length estimates in specific analyses. However, the estimated gene tree spectra for all data types had a relatively good fit to their expectation given the multispecies coalescent. Conclusions:Overall, our data suggest that conflicts among previous studies probably reflect the use of smaller datasets (both in terms of number of sites and of loci) in those analyses. Our results demonstrate the importance of sampling large numbers of loci, each of which has a sufficient number of sites to provide robust estimates of gene trees. Low-coverage whole genome sequencing, as we did here, represents a cost-effective means to generate the very large data sets that include multiple data types that enabled us to obtain a robust estimate of Gallus phylogeny.
Lineage-specific transcriptional networks drive cellular differentiation and development. Disruption of these specific cell programs can result in cancer and create a subset of tumors that are “transcriptionally addicted.” Sarcomas, for example, are characterized by an oncogenic fusion protein consisting of a FET family RNA-binding protein fused to a transcription factor (TF). The oncogenic fusions result in a restructuring of the transcriptome promoting cancer. In chordoma, TBXT (brachyury)—normally involved in notochord differentiation—is aberrantly expressed, resulting in promotion of cancer. As these oncogenic TFs are difficult to target directly, we and others have proposed targeting associated transcriptional regulators to inhibit their activity. Cyclin-dependent kinase 9 (CDK9) interacts with TFs to promote activation of target genes and promotes transcription elongation through phosphorylation of RNA polymerase II. We have developed a potent, selective, and orally bioavailable CDK9 inhibitor, KB-0742. Here we present the preclinical activity of KB-0742 in models of sarcoma and chordoma. We first evaluated KB-0742 activity in sarcoma in a 300-immortalized cell line screen containing 18 soft tissue sarcoma cell lines. The median IC50 of the cell lines in the study was 0.8705 µM. Sarcoma cell lines were enriched for sensitivity to KB-0742 with 61% (11/18) of the lines with an IC50 below the median. The 11 cell lines had a median IC50 of 0.679 µM. We then evaluated the activity of KB-0742 in 5 patient-derived cell line (PDC) models, with all 5 showing a cytotoxic response to treatment as measured by negative GRmax values. Last, a single patient-derived organoid (PDO) model of adult rhabdomyosarcoma was evaluated. KB-0742 treatment resulted in an IC50 of 2.75 µM and a max inhibition rate of 98.61%. For chordoma we examined the activity of KB-0742 in vivo using 2 patient-derived xenograft (PDX) models. In model CF466, we observed a dose-dependent response with increased TGI and greater reductions in RNA polymerase II phosphorylation in tumors treated with KB-0742 at 60 mg/kg as compared to 30 mg/kg. We then evaluated KB-0742 as a single agent and in combination with afatinib in the model CF539. KB-0742 as a single agent showed similar TGI activity as afatinib, whereas the combination showed increased response as compared to the two single-agent arms. These data show that transcriptionally addicted tumors are sensitive to CDK9 inhibition via KB-0742 treatment, and support the continued development of our compound to potentially treat sarcoma and chordoma. KB-0742 is currently being evaluated in a phase 1/2 clinical trial (NCT04718675) for relapsed or refractory solid tumors or non-Hodgkin lymphoma. Once a recommended phase 2 dose is established, expansion cohorts for patients with sarcoma, chordoma, and other transcriptionally addicted tumors may be opened. Citation Format: Melinda A. Day, Douglas C. Saffran, Tressa Hood, Nikolaus Obholzer, Akanksha Pandey, Charles Y. Lin, Pavan Kumar, Daniel M. Freed, Jorge DiMartino. CDK9 inhibition via KB-0742 is a potential strategy to treat transcriptionally addicted cancers [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 2564.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.