Single-cell RNA-sequencing has become a powerful tool to study biologically significant characteristics at explicitly high resolution. However, its application on emerging data is currently limited by its intrinsic techniques. Here, we introduce Tissue-AdaPtive autoEncoder (TAPE), a deep learning method connecting bulk RNA-seq and single-cell RNA-seq to achieve precise deconvolution in a short time. By constructing an interpretable decoder and training under a unique scheme, TAPE can predict cell-type fractions and cell-type-specific gene expression tissue-adaptively. Compared with popular methods on several datasets, TAPE has a better overall performance and comparable accuracy at cell type level. Additionally, it is more robust among different cell types, faster, and sensitive to provide biologically meaningful predictions. Moreover, through the analysis of clinical data, TAPE shows its ability to predict cell-type-specific gene expression profiles with biological significance. We believe that TAPE will enable and accelerate the precise analysis of high-throughput clinical data in a wide range.
Muilti-modality data are ubiquitous in biology, especially that we have entered the multi-omics era, when we can measure the same biological object (cell) from different aspects (omics) to provide a more comprehensive insight into the cellular system. When dealing with such multi-omics data, the first step is to determine the correspondence among different modalities. In other words, we should match data from different spaces corresponding to the same object. This problem is particularly challenging in the single-cell multi-omics scenario because such data are very sparse with extremely high dimensions. Secondly, matched single-cell multi-omics data are rare and hard to collect. Furthermore, due to the limitations of the experimental environment, the data are usually highly noisy. To promote the single-cell multi-omics research, we overcome the above challenges, proposing a novel framework to align and integrate single-cell RNA-seq data and single-cell ATAC-seq data. Our approach can efficiently map the above data with high sparsity and noise from different spaces to a low-dimensional manifold in a unified space, making the downstream alignment and integration straightforward. Compared with the other state-of-the-art methods, our method performs better in both simulated and real single-cell data. The proposed method is helpful for the single-cell multi-omics research. The improvement for integration on the simulated data is significant.
Motivation We have entered the multi-omics era and can measure cells from different aspects. Hence, we can get a more comprehensive view by integrating or matching data from different spaces corresponding to the same object. However, it is particularly challenging in the single-cell multi-omics scenario because such data are very sparse with extremely high dimensions. Though some techniques can be used to measure scATAC-seq and scRNA-seq simultaneously, the data are usually highly noisy due to the limitations of the experimental environment. Results To promote single-cell multi-omics research, we overcome the above challenges, proposing a novel framework, contrastive cycle adversarial autoencoders, which can align and integrate single-cell RNA-seq data and single-cell ATAC-seq data. Con-AAE can efficiently map the above data with high sparsity and noise from different spaces to a coordinated subspace, where alignment and integration tasks can be easier. We demonstrate its advantages on several datasets. Availability Zenodo link: https://zenodo.org/badge/latestdoi/368779433 github: https://github.com/kakarotcq/Con-AAE. Supplementary information Supplementary data are available at Bioinformatics online.
As an important group of proteins discovered in phages, anti-CRISPR inhibits the activity of the immune system of bacteria (i.e., CRISPR-Cas), showing great potential for gene editing and phage therapy. However, the prediction and discovery of anti-CRISPR are challenging for its high variability and fast evolution. Existing biological studies often depend on known CRISPR and anti-CRISPR pairs, which may not be practical considering the huge number of pairs in reality. Computational methods usually struggle with prediction performance. To tackle these issues, we propose a novel deep learning method for anti-CRISPR analysis (DeepAcr), which achieves impressive performance. On both the cross-fold and cross-dataset validation, our method outperforms the previous state-of-the-art methods significantly. Impressively, DeepAcr improves the prediction performance by at least 40% regarding the F1 score for the cross-dataset test. Moreover, DeepAcr is the first computational method to predict the detailed anti-CRISPR classes, which may help illustrate the anti-CRISPR mechanism. Taking advantage of a Transformer protein language model pre-trained on 250 million protein sequences, DeepAcr overcomes the data scarcity problem. Extensive experiments and analysis suggest that Transformer model feature, evolutionary feature, and local structure feature complement each other, which indicates the critical properties of anti-CRISPR proteins. Combined with AlphaFold prediction, further motif analysis and docking experiments demonstrate that DeepAcr captures the evolutionarily conserved pattern and the interaction between anti-CRISPR and the target implicitly. With the impressive prediction capability, DeepAcr can serve as a valuable tool for anti-CRISPR study and new anti-CRISPR discovery.
Single-cell RNA-sequencing (RNA-seq) has become a powerful tool to study biologically significant characteristics at explicitly high resolution. However, its application on emerging data is currently limited by its intrinsic techniques. Though many methods have been proposed to analyze bulk data using single-cell profile as a reference, they are limited on the interpretability, processing speed, and data size requirement. Here, we introduce Tissue-AdaPtive autoEncoder (TAPE), a deep learning method connecting bulk RNA-seq and single-cell RNA-seq, to achieve precise prediction in short time. By constructing an interpretable decoder and training under a unique scheme, TAPE can predict cell-type fractions and cell-type-specific gene expression tissue-adaptively. Compared with existing methods on several benchmarking datasets, TAPE is more accurate (up to 40% improvement on the real bulk data) and faster. Thus, it is sensitive enough to provide biologically meaningful predictions. For example, only TAPE can predict the tendency of increasing monocytes-to-lymphocytes ratio in COVID-19 patients from mild to serious symptoms, of which estimated indices are consistent with laboratory data. More importantly, through the analysis of clinical data, TAPE shows its ability to predict cell-type-specific gene expression profiles with biological significance. Combining with single-sample gene set enrichment analysis (ssGSEA), TAPE also provides valuable clues to investigate the immune response in different virus-infected patients. We believe that TAPE will enable and accelerate the precise analysis of high-throughput clinical data in a wide range.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.