The new algorithm, named motif discovery using orthologous sequences (MDOS), is available at http://www.ics.uci.edu/ approximately xhx/project/mdos/.
A major goal of human genetics is to elucidate the genetic architecture of human disease, with the goal of fueling improvements in diagnosis and the understanding of disease pathogenesis. The degree to which epistasis, or non-additive effects of risk alleles at different loci, accounts for common disease traits is hotly debated, in part because the conditions under which epistasis evolves are not well understood. Using both theory and evolutionary simulation, we show that the occurrence of common diseases (i.e. unfit phenotypes with frequencies on the order of 1%) can, under the right circumstances, be expected to be driven primarily by synergistic epistatic interactions. Conditions that are necessary, collectively, for this outcome include a strongly non-linear phenotypic landscape, strong (but not too strong) selection against the disease phenotype, and “noise” in the genotype-phenotype map that is both environmental (extrinsic, time-correlated) and developmental (intrinsic, uncorrelated) and, in both cases, neither too little nor too great. These results suggest ways in which geneticists might identify, a priori, those disease traits for which an “epistatic explanation” should be sought, and in the process better focus ongoing searches for risk alleles.
Many approaches have been developed to overcome technical noise in single cell (and single nucleus) RNA-sequencing (scRNAseq). As researchers dig deeper into data--looking for rare cell types, subtleties of cell states, and details of gene regulatory networks--there is a growing need for algorithms with controllable accuracy and a minimum of ad hoc parameters and thresholds. Impeding this goal is the fact that an appropriate null distribution for scRNAseq cannot simply be extracted from data in the event that ground truth about biological variation is unknown (i.e., most of the time). Here we approach this problem analytically, based on the assumption that scRNAseq data reflect only cell heterogeneity (what we seek to characterize), transcriptional noise (temporal fluctuations randomly distributed across cells), and sampling error (i.e., Poisson noise). We then analyze scRNAseq data without normalization--a step that can skew distributions, particular for sparse data--and calculate p-values associated with key statistics. We develop an improved method for the selection of features for cell clustering and the identification of gene-gene correlations, both positive and negative. Using simulated data, we show that this method, which we call BigSur (Basic Informatics and Gene Statistics from Unnormalized Reads), accurately captures even weak yet significant correlation structures in scRNAseq data. Applying BigSur to data from a clonal human melanoma cell line, we identify tens of thousands of correlations that, when clustered without supervision into gene communities, both align with cellular components and biological processes, and point toward potentially novel cell biological relationships.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.