SummaryA major challenge of the post-genomics era is to define the connectivity of protein phosphorylation networks. Here, we quantitatively delineate the insulin signaling network in adipocytes by high-resolution mass spectrometry-based proteomics. These data reveal the complexity of intracellular protein phosphorylation. We identified 37,248 phosphorylation sites on 5,705 proteins in this single-cell type, with approximately 15% responding to insulin. We integrated these large-scale phosphoproteomics data using a machine learning approach to predict physiological substrates of several diverse insulin-regulated kinases. This led to the identification of an Akt substrate, SIN1, a core component of the mTORC2 complex. The phosphorylation of SIN1 by Akt was found to regulate mTORC2 activity in response to growth factors, revealing topological insights into the Akt/mTOR signaling network. The dynamic phosphoproteome described here contains numerous phosphorylation sites on proteins involved in diverse molecular functions and should serve as a useful functional resource for cell biologists.
Ensemble learning is an intensively studies technique in machine learning and pattern recognition. Recent work in computational biology has seen an increasing use of ensemble learning methods due to their unique advantages in dealing with small sample size, high-dimensionality, and complexity data structures. The aim of this article is twofold. First, it is to provide a review of the most widely used ensemble learning methods and their application in various bioinformatics problems, including the main topics of gene expression, mass spectrometry-based proteomics, gene-gene interaction identification from genome-wide association studies, and prediction of regulatory elements from DNA and protein sequences. Second, we try to identify and summarize future trends of ensemble methods in bioinformatics. Promising directions such as ensemble of support vector machine, meta-ensemble, and ensemble based feature selection are discussed.
Concerted examination of multiple collections of single-cell RNA sequencing (RNA-seq) data promises further biological insights that cannot be uncovered with individual datasets. Here we present scMerge, an algorithm that integrates multiple single-cell RNA-seq datasets using factor analysis of stably expressed genes and pseudoreplicates across datasets. Using a large collection of public datasets, we benchmark scMerge against published methods and demonstrate that it consistently provides improved cell type separation by removing unwanted factors; scMerge can also enhance biological discovery through robust data integration, which we show through the inference of development trajectory in a liver dataset collection.
Highlights d Multi-omic maps of embryonic stem cells transitioning from naive to primed pluripotency d Phosphoproteome dynamics precede changes to epigenome, transcriptome, and proteome d ERK signaling is dispensable beyond the initial phase of exit from naive pluripotency d Comparative analysis of mouse and human naive and primed pluripotent states
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.