SUMMARY
Variability in induced pluripotent stem cell (iPSC) lines remains a concern for disease modeling and regenerative medicine. We have used RNA sequencing analysis and linear mixed models to examine the sources of gene expression variability in 317 human iPSC lines from 101 individuals. We found that ~50% of genome-wide expression variability is explained by variation across individuals and identified a set of expression quantitative trait loci that contribute to this variation. These analyses coupled with allele specific expression show that iPSCs retain a donor specific gene expression pattern. Network, pathway and key driver analyses showed that Polycomb targets contribute significantly to the non-genetic variability seen within and across individuals, highlighting this chromatin regulator as a likely source of reprogramming-based variability. Our findings therefore shed light on variation between iPSC lines and illustrate the potential for our dataset and other similar large-scale analyses to identify underlying drivers relevant to iPSC applications.
By integrating genome, transcriptome and proteome data, Petyuk et al. identify HSPA2 as a driver of pathology in Alzheimer’s disease, replicating the finding in an independent cohort and validating it in two in vitro systems. The results highlight the power of systems approaches for identifying genes involved in disease pathways.
The problem of generating random samples of high-dimensional posterior distributions is considered. The main results consist of non-asymptotic computational guarantees for Langevin-type MCMC algorithms which scale polynomially in key quantities such as the dimension of the model, the desired precision level, and the number of available statistical measurements. As a direct consequence, it is shown that posterior mean vectors as well as optimisation based maximum a posteriori (MAP) estimates are computable in polynomial time, with high probability under the distribution of the data. These results are complemented by statistical guarantees for recovery of the ground truth parameter generating the data.
Our results are derived in a general high-dimensional non-linear regression setting (with Gaussian process priors) where posterior measures are not necessarily log-concave, employing a set of local ‘geometric’ assumptions on the parameter space, and assuming that a good initialiser of the algorithm is available. The theory is applied to a representative non-linear example from PDEs involving a steady-state Schrödinger equation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.