Single-Cell Co-expression Analysis Reveals Distinct Functional Modules, Co-regulation Mechanisms and Clinical Outcomes

Wang, Jie; Xia, Shuli; Arand, Brian; Zhu, Heng; Machiraju, Raghu; Huang, Kun; Qian, Jiang

doi:10.1371/journal.pcbi.1004892

Cited by 41 publications

(33 citation statements)

References 52 publications

(54 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…However TSPAN8 promote cancer cell stemness via activation of Hedgehog signaling 37 Systems biology approaches can provide immediate functional insights by revealing interactions between genes 39 . A motivation for WGCNA is that genes functioning together are regulated or co-expressed together 40 . Ballouz and cauthor 41 suggested a minimal of 20 samples to predict meaningful functional connectivity.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Deeper insights into long-term survival heterogeneity of Pancreatic Ductal Adenocarcinoma (PDAC) patients using integrative individual- and group-level transcriptome network analyses

Bhardwaj

Josse

Daele

et al. 2020

Preprint

View full text Add to dashboard Cite

Background: Pancreatic ductal adenocarcinoma (PDAC) is categorized as the seventh leading cause of cancer mortality worldwide. Its predictive markers for long-term survival are not well known. Therefore, it is interesting to delineate individual-specific perturbed genes when comparing long-term (LT) and short-term (ST) PDAC survivors, and to exploit the integrative individual-and group-based transcriptome profiling.Method: Using a discovery cohort of 19 PDAC patients from CHU-Liège (Belgium), we first performed differential gene expression (DGE) analysis comparing LT to ST survivor. Second, we adopted unsupervised systems biology approaches to obtain gene modules linked to clinical features. Third, we created individual-specific perturbation profiles and identified key regulators across the LT patients. Furthermore, we applied two gene prioritization approaches: random walk-based Degree-Aware disease gene prioritizing (DADA) method to develop PDAC disease modules; Network-based Integration of Multi-omics Data (NetICS) to integrate group-based and individual-specific perturbed genes in relation to PDAC LT survival. Findings: We identified 173 differentially expressed genes (DEGs) in ST and LT survivors and five modules (including 38 DEGs) showing associations to clinical traits such as tumor size and chemotherapy. DGE analysis identified differences in genes involved in metabolic and cell cycle activity. Validation of DEGs in the molecular lab suggested a role of REG4 and TSPAN8 in PDAC survival. Individual-specific omics changes across LT survivors revealed biological signatures such as focal adhesion and extracellular matrix receptors, implying a potential role in molecular-level heterogeneity of LT PDAC survivors. Via NetICS and DADA we not only identified various known oncogenes such as CUL1, SCF62, EGF, FOSL1, MMP9, and TGFB1, but also highlighted novel genes (TAC1, KCNH7, IRS4, DKK4). Interpretation: Our proposed analytic workflow shows the advantages of combining clinical and omics data as well as individual-and group-level transcriptome profiling. It suggested novel potential transcriptome marks of LT survival heterogeneity in PDAC. Funding: Télévie-FRS-FNRS

show abstract

Section: Discussionmentioning

confidence: 99%

“…Two genes, NOSTRIN and ADGRG6, were shared by 66% of LTS, and have been reported before to be associated with PDAC survival. 40,53 Drugs bind to their target proteins and perturb the transcriptome of a cancer cell 54 . In our study, analytic functional analysis of individual PEEPs helped to decode homogeneity patterns within LTS.…”

Section: Discussionmentioning

confidence: 99%

Deeper insights into long-term survival heterogeneity of Pancreatic Ductal Adenocarcinoma (PDAC) patients using integrative individual- and group-level transcriptome network analyses

Bhardwaj

Josse

Daele

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…For H. sapiens, Wang and colleagues recently compared the expression profiles of bulk tissue of glioblastoma patients to expression profiles at single-cell level [35]. Interestingly, they found that coexpression in bulk samples was stronger associated with similar gene function than that in single cell samples.…”

Section: B)mentioning

confidence: 99%

“…Also genes in eukaryotic genomes have been reported to have a tendency to cluster when showing similar expression, and the genes in these clusters tend to have related functions [28][29][30][31][32][33]. Wang and colleagues, as well as Barkai and colleagues showed that if two eukaryotic genes have the same expression levels in different conditions, they are likely to be members of the same protein complex or to participate in the same biological pathways [34,35]. Also, Lee and Sonnhammer reported that genes involved in the same biochemical pathways tend to gather in various eukaryotic genomes [31].…”

Section: Introductionmentioning

confidence: 99%

Annotating the Function of Protein-coding Genes Based on Gene Ontology Terms of Neighboring Co-expressed Genes

Tran¹,

Barghash²,

Helms³

2018

J Proteomics Bioinform

View full text Add to dashboard Cite

Proteins are of key importance in virtually every cellular process but many proteins have still not been annotated with functions due to experimental difficulties involved with functional assays. To address this problem, many computational methods based on sequence homology, three-dimensional structure, genomic context, and gene expression were developed to predict functions of proteins. Here, we tested the performance of a novel approach that is motivated by the concept of bacterial operons. To predict the substrate specificities of membrane transporters we combined genomic context-based methods with Gene Ontology and gene expression data whereby using SVM for classifying genes. We found that in Escherichia coli, the substrate-specificities of membrane transporters can be predicted with ca. 90% accuracy from the biological functions of co-expressed neighboring genes. In Saccharomyces cerevisiae and Homo sapiens, the respective accuracies are lower at around 80%. When applying the same strategy to enzymes of four metabolic classes of Escherichia coli, we found lower accuracies of 77% (2-class prediction) and 68% (4-class prediction), respectively. This suggests that transfer of functional associations between co-expressed neighbor genes may be case-specific

show abstract

“…More recently, clustering analysis was used to identify and characterize cell types in various tissues and tumors in the colon 19 , brain 20 , blood 12 , and lung 21 , with the overall aim of finding key stem and progenitor cell populations involved in tissue development, repair, and tumorigenesis. Another application is to find sets of coordinately regulated genes in order to find gene modules 11,22,23 . Such clusters of genes (or other features such as single-nucleotide polymorphism (SNPs) 24 ) can be further analyzed by gene set enrichment approaches to identify gene annotations 25 (e.g.…”

mentioning

confidence: 99%

Applications of community detection algorithms to large biological datasets

Kanter

Yaari

Kalisky

2019

Preprint

View full text Add to dashboard Cite

Recent advances in data acquiring technologies in biology have led to major challenges in mining relevant information from large datasets. For example, single-cell RNA sequencing technologies are producing expression and sequence information from tens of thousands of cells in every single experiment. A common task in analyzing biological data is to cluster samples or features (e.g. genes) into groups sharing common characteristics. This is an NP-hard problem for which numerous heuristic algorithms have been developed. However, in many cases, the clusters created by these algorithms do not reflect biological reality. To overcome this, a Networks Based Clustering (NBC) approach was recently proposed, by which the samples or genes in the dataset are first mapped to a network and then community detection (CD) algorithms are used to identify clusters of nodes.Here, we created an open and flexible python-based toolkit for NBC that enables easy and accessible network construction and community detection. We then tested the applicability of NBC for identifying clusters of cells or genes from previously published large-scale single-cell and bulk RNA-seq datasets. We show that NBC can be used to accurately and efficiently analyze large-scale datasets of RNA sequencing experiments. IntroductionAdvances in high-throughput genomic technologies have revolutionized the way biological data is being acquired. Technologies like DNA sequencing (DNA-seq), RNA sequencing (RNA-seq), chromatin immunoprecipitation sequencing (ChIP-seq), and mass cytometry are becoming standard components of modern biological research. The majority of these datasets are publicly available for further large-scale studies. Notable examples include the Genotype-Tissue Expression (GTEx) project 1 , the cancer genome atlas (TCGA) 2 , and the 1000 genomes project 3 . Examples of utilizing these datasets include studying allele-specific expression across tissues 4, 5 , characterizing functional variation in the human genome 6 , finding patterns of transcriptome variations across individuals and tissues 7 , and characterizing the global mutational landscape of cancer 8 . Moreover, some of these genomic technologies have recently been adapted to work at the single-cell level 9 . While pioneering single-cell RNA sequencing (scRNA-seq) studies were able to process relatively small numbers of cells (42 cells in 10 and 18 cells in 11 ), recent single-cell RNA-seq studies taking advantage of automation and nanotechnology were able to produce expression and sequence data from many thousands of individual cells (∼1,500 cells in 12 and ∼40,000 cells in 13 ). Hence, biology is facing significant challenges in handling and analyzing large complex datasets 14,15 . Clustering analysisOne of the common methods used for making sense of large biological datasets is cluster analysis: the task of grouping similar samples or features 16 . For example, clustering analysis has been used to identify subtypes of breast tumors 17, 18 with implications to treatment and prognosis. More r...

show abstract

Single-Cell Co-expression Analysis Reveals Distinct Functional Modules, Co-regulation Mechanisms and Clinical Outcomes

Cited by 41 publications

References 52 publications

Deeper insights into long-term survival heterogeneity of Pancreatic Ductal Adenocarcinoma (PDAC) patients using integrative individual- and group-level transcriptome network analyses

Deeper insights into long-term survival heterogeneity of Pancreatic Ductal Adenocarcinoma (PDAC) patients using integrative individual- and group-level transcriptome network analyses

Annotating the Function of Protein-coding Genes Based on Gene Ontology Terms of Neighboring Co-expressed Genes

Applications of community detection algorithms to large biological datasets

Contact Info

Product

Resources

About