Ancestral sequence reconstruction is a technique that is gaining widespread use in molecular evolution studies and protein engineering. Accurate reconstruction requires the ability to handle appropriately large numbers of sequences, as well as insertion and deletion (indel) events, but available approaches exhibit limitations. To address these limitations, we developed Graphical Representation of Ancestral Sequence Predictions (GRASP), which efficiently implements maximum likelihood methods to enable the inference of ancestors of families with more than 10,000 members. GRASP implements partial order graphs (POGs) to represent and infer insertion and deletion events across ancestors, enabling the identification of building blocks for protein engineering. To validate the capacity to engineer novel proteins from realistic data, we predicted ancestor sequences across three distinct enzyme families: glucose-methanol-choline (GMC) oxidoreductases, cytochromes P450, and dihydroxy/sugar acid dehydratases (DHAD). All tested ancestors demonstrated enzymatic activity. Our study demonstrates the ability of GRASP (1) to support large data sets over 10,000 sequences and (2) to employ insertions and deletions to identify building blocks for engineering biologically active ancestors, by exploring variation over evolutionary time.
We developed Graphical Representation of Ancestral Sequence Predictions (GRASP) to infer and explore ancestral variants of protein families with more than 10,000 members. GRASP uses partial order graphs to represent homology in very large datasets, which are intractable with current inference tools and may, for example, be used to engineer proteins by identifying ancient variants of enzymes. We demonstrate that (1) across three distinct enzyme families, GRASP predicts ancestor sequences, all of which demonstrate enzymatic activity, (2) within-family insertions and deletions can be used as building blocks to support the engineering of biologically active ancestors via a new source of ancestral variation, and (3) generous inclusion of sequence data encompassing great diversity leads to less variance in ancestor sequence.GRASP is the central tool in the GRASP-suite, which is freely available at
Accurately identifying cell-populations is paramount to the quality of downstream analyses and overall interpretations of single-cell RNA-seq (scRNA-seq) datasets but remains a challenge. The quality of single-cell clustering depends on the proximity metric used to generate cell-to-cell distances. Accordingly, proximity metrics have been benchmarked for scRNA-seq clustering, typically with results averaged across datasets to identify a highest performing metric. However, the ‘best-performing’ metric varies between studies, with the performance differing significantly between datasets. This suggests that the unique structural properties of an scRNA-seq dataset, specific to the biological system under study, have a substantial impact on proximity metric performance. Previous benchmarking studies have omitted to factor the structural properties into their evaluations. To address this gap, we developed a framework for the in-depth evaluation of the performance of 17 proximity metrics with respect to core structural properties of scRNA-seq data, including sparsity, dimensionality, cell-population distribution and rarity. We find that clustering performance can be improved substantially by the selection of an appropriate proximity metric and neighbourhood size for the structural properties of a dataset, in addition to performing suitable pre-processing and dimensionality reduction. Furthermore, popular metrics such as Euclidean and Manhattan distance performed poorly in comparison to several lessor applied metrics, suggesting that the default metric for many scRNA-seq methods should be re-evaluated. Our findings highlight the critical nature of tailoring scRNA-seq analyses pipelines to the dataset under study and provide practical guidance for researchers looking to optimize cell-similarity search for the structural properties of their own data.
A prominent aspect of most, if not all, central nervous systems (CNSs) is that anterior regions (brain) are larger than posterior ones (spinal cord). Studies in Drosophila and mouse have revealed that Polycomb Repressor Complex 2 (PRC2), a protein complex responsible for applying key repressive histone modifications, acts by several mechanisms to promote anterior CNS expansion. However, it is unclear what the full spectrum of PRC2 action is during embryonic CNS development and how PRC2 intersects with the epigenetic landscape. We removed PRC2 function from the developing mouse CNS, by mutating the key gene Eed, and generated spatio-temporal transcriptomic data. To decode the role of PRC2, we developed a method that incorporates standard statistical analyses with probabilistic deep learning to integrate the transcriptomic response to PRC2 inactivation with epigenetic data. This multi-variate analysis corroborates the central involvement of PRC2 in anterior CNS expansion, and also identifies several unanticipated cohorts of genes, such as proliferation and immune response genes. Furthermore, the analysis reveals specific profiles of regulation via PRC2 upon these gene cohorts. These findings uncover a differential logic for the role of PRC2 upon functionally distinct gene cohorts that drive CNS anterior expansion. To support the analysis of emerging multi-modal datasets, we provide a novel bioinformatics package that integrates transcriptomic and epigenetic datasets to identify regulatory underpinnings of heterogeneous biological processes.
Fumarate hydratase (FH) is a mitochondrial enzyme that catalyzes the reversible hydration of fumarate to malate in the tricarboxylic acid (TCA) cycle. Germline mutations of FH lead to hereditary leiomyomatosis and renal cell carcinoma (HLRCC), a cancer syndrome characterized by a highly aggressive form of renal cancer. Although HLRCC tumors metastasize rapidly, FH-deficient mice develop premalignant cysts in the kidneys, rather than carcinomas. How Fh1 -deficient cells overcome these tumor-suppressive events during transformation is unknown. Here, we perform a genome-wide CRISPR-Cas9 screen to identify genes that, when ablated, enhance the proliferation of Fh1 -deficient cells. We found that the depletion of the histone cell cycle regulator (HIRA) enhances proliferation and invasion of Fh1 -deficient cells in vitro and in vivo. Mechanistically, Hira loss activates MYC and its target genes, increasing nucleotide metabolism specifically in Fh1 -deficient cells, independent of its histone chaperone activity. These results are instrumental for understanding mechanisms of tumorigenesis in HLRCC and the development of targeted treatments for patients.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.