2020
DOI: 10.1186/s13059-020-02116-x
|View full text |Cite
|
Sign up to set email alerts
|

Integrative analyses of single-cell transcriptome and regulome using MAESTRO

Abstract: We present Model-based AnalysEs of Transcriptome and RegulOme (MAESTRO), a comprehensive open-source computational workflow ( http://github.com/liulab-dfci/MAESTRO ) for the integrative analyses of single-cell RNA-seq (scRNA-seq) and ATAC-seq (scATAC-seq) data from multiple platforms. MAESTRO provides functions for pre-processing, alignment, quality control, expression and chromatin accessibility quantification, clustering, differential analysis, and annotation. By modeling gene regulato… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
155
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 153 publications
(161 citation statements)
references
References 90 publications
(106 reference statements)
0
155
0
Order By: Relevance
“…We applied a standardized analysis workflow based on MAESTRO v1.1.0 ( 18 ) for processing all the collected datasets, including quality control, batch effect removal, cell clustering, differential expression analysis, cell-type annotation, malignant cell classification and gene set enrichment analysis (GSEA; Figure 2 ). The raw count, TPM or FPKM table was used as input for the standardized workflow.…”
Section: Methodsmentioning
confidence: 99%
“…We applied a standardized analysis workflow based on MAESTRO v1.1.0 ( 18 ) for processing all the collected datasets, including quality control, batch effect removal, cell clustering, differential expression analysis, cell-type annotation, malignant cell classification and gene set enrichment analysis (GSEA; Figure 2 ). The raw count, TPM or FPKM table was used as input for the standardized workflow.…”
Section: Methodsmentioning
confidence: 99%
“…However, due to the substantial scale and sparsity of the region by cell count matrix, specialized bioinformatics tools have been developed -mostly for scATAC-seq data -to handle these assay-specific challenges 191,[233][234][235][236][237][238][239][240][241][242] . One major point in which these tools differ is the way they define genomic regions to be used as features, either as peaks from bulk or aggregated single-cell data (chromVar 239 , Cicero 238 , cisTopic 191 , scABC 241 , Scasat 233 , MAESTRO 242 ), peaks from pseudo-bulk samples 56 or fixed-size bins 56 (SnapATAC 243 ). Another difference between the bulk and single cell-based algorithms is what the count features represent, for example, counting reads in peaks (cisTopic 56,191 , scABC 241 , Scasat 233 , MAESTRO 242 ), counting gapped k-mers under peaks or around transposase cut sites (BROCKMAN 234 , chromVAR 239 ), or counting reads overlapping TF motifs in peaks or genome-wide (chromVar 239 , SCRAT 237 ) 244 .…”
Section: Single-cell Data Analysismentioning
confidence: 99%
“…One major point in which these tools differ is the way they define genomic regions to be used as features, either as peaks from bulk or aggregated single-cell data (chromVar 239 , Cicero 238 , cisTopic 191 , scABC 241 , Scasat 233 , MAESTRO 242 ), peaks from pseudo-bulk samples 56 or fixed-size bins 56 (SnapATAC 243 ). Another difference between the bulk and single cell-based algorithms is what the count features represent, for example, counting reads in peaks (cisTopic 56,191 , scABC 241 , Scasat 233 , MAESTRO 242 ), counting gapped k-mers under peaks or around transposase cut sites (BROCKMAN 234 , chromVAR 239 ), or counting reads overlapping TF motifs in peaks or genome-wide (chromVar 239 , SCRAT 237 ) 244 . ArchR 236 uses an iterative feature definition method; it first defines a feature-by-cell count matrix of the number of reads per feature (in this case, 500-bp genomic bins) across all single cells, which then undergoes an iterative latent semantic indexing reduction to generate the cell clusters and pseudo-bulk samples on which peaks are called.…”
Section: Single-cell Data Analysismentioning
confidence: 99%
“…Taking the integration of scRNA-seq and scATAC-seq as an example, the matrix can be derived from scATAC-seq profiles by summing reads in gene bodies 17,19,23 . This can also be input from the regulatory potential (RP) model in MAESTRO 16 . In a simpler case where and have matched features, the integration tasks fall into two categories: 1) batch correction for scRNA-seq data across individuals, species, or technologies; 2) integration of scRNA-seq with spatial transcriptome data.…”
Section: Initializing Feature Matching Across Modalitiesmentioning
confidence: 99%
“…This is mathematically challenging, however, as there are many possible ways to simultaneously align a large number of cells and features. To address this challenge, existing computational approaches followed two directions: 1) aligning features empirically before aligning cells [16][17][18][19] ; 2) obtaining separate embeddings for each modality, followed by performing unsupervised manifold alignment [20][21][22] . Taking integration of scRNA-seq and singe cell assay for transposase accessible chromatin sequencing (scATAC-seq) as an example, the first category of methods require constructing a "gene activity matrix" from scATAC-seq data by counting DNA reads aligned near and within each gene 23 .…”
Section: Introductionsmentioning
confidence: 99%