Background: Recently, DNA methylation has drawn great attention due to its strong correlation with abnormal gene activities and informative representation of the cancer status. As a number of studies focus on DNA methylation signatures in cancer, demand for utilizing publicly available methylome dataset has been increased. To satisfy this, large-scale projects were launched to discover biological insights into cancer, providing a collection of the dataset. However, public cancer data, especially for certain cancer types, is still limited to be used in research. Several simulation tools for producing epigenetic dataset have been introduced in order to alleviate the issue, still, to date, generation for user-specified cancer type dataset has not been proposed. Results: In this paper, we present methCancer-gen, a tool for generating DNA methylome dataset considering type for cancer. Employing conditional variational autoencoder, a neural network-based generative model, it estimates the conditional distribution with latent variables and data, and generates samples for specified cancer type. Conclusions: To evaluate the simulation performance of methCancer-gen for the user-specified cancer type, our proposed model was compared to a benchmark method and it could successfully reproduce cancer type-wise data with high accuracy helping to alleviate the lack of condition-specific data issue. methCancer-gen is publicly available at https://github.com/cbi-bioinfo/methCancer-gen.
The development of single-cell RNA sequencing (scRNA-seq) has enabled gene expression to be quantified at single-cell resolution. Such advancement is expected to solve important issues that bulk RNA sequencing could not fully answer, such as inferring cell population heterogeneity, genetic variability of cells, detecting rare cell types, accurately predicting cell states and their localization. However, analyzing such large scale data, especially when they are sampled at multiple time points, brings new challenges in data mining informative genes, compared to single snapshot samples. It becomes even more complicated when gene expression patterns are to be mined from time-series scRNA-seq datasets generated from multiple conditions, which will constitute a data with gene, condition and time dimensions. Here, we focused on detecting gene expression patterns that well capture the underlying biological differences between time-series scRNA-seq datasets of three different types of stem cells. The gene expression profile of 2,128 time-series scRNA-seq samples from long-term hematopoietic stem cells (LT-HSC) and two of its progenitor cell types were analyzed using our framework. We have successfully detected condition specific feature genes that were able to achieve 90.03% classification accuracy between the three cell types. Investigating the genes and clusters detected by our framework, we found that cell cycle related genes showed significantly high variance between the three cell types. Such results and transcriptomic characters detected from our analysis were consistent with the original study. Collectively, the framework was able to successfully detect biological meaningful gene sets and expression patterns from multi-condition time-series scRNA-seq samples. INDEX TERMS Gene expression, multi-class, single-cell, time-series.
Clinical islet transplantation has recently been a promising treatment option for intractable type 1 diabetes patients. Although early graft loss has been well studied and controlled, the mechanisms of late graft loss largely remains obscure. Since long-term islet graft survival had not been achieved in islet xenotransplantation, it has been impossible to explore the mechanism of late islet graft loss. Fortunately, recent advances where consistent long-term survival (≥6 months) of adult porcine islet grafts was achieved in five independent, diabetic nonhuman primates (NHPs) enabled us to investigate on the late graft loss. Regardless of the conventional immune monitoring methods applied in the post-transplant period, the initiation of late graft loss could rarely be detected before the overt graft loss observed via uncontrolled blood glucose level. Thus, we retrospectively analyzed the gene expression profiles in 2 rhesus monkey recipients using peripheral blood RNA-sequencing (RNA-seq) data to find out the potential cause(s) of late graft loss. Bioinformatic analyses showed that highly relevant immunological pathways were activated in the animal which experienced late graft failure. Further connectivity analyses revealed that the activation of T cell signaling pathways was the most prominent, suggesting that T cell-mediated graft rejection could be the cause of the late-phase islet loss. Indeed, the porcine islets in the biopsied monkey liver samples were heavily infiltrated with CD3+ T cells. Furthermore, hypothesis test using a computational experiment reinforced our conclusion. Taken together, we suggest that bioinformatics analyses with peripheral blood RNA-seq could unveil the cause of insidious late islet graft loss.
Identification of single-cell subtypes is one of the fundamental processes required to understand a heterogeneous population composed of multiple cells, based on single-cell RNA sequencing data. Previously, cell subtype identification was mainly carried out by dimension reduction and clustering approaches that grouped cells with similar expressed profiles together. However, for high robustness to noises and systematic annotation of the subtype in each cell, supervised classification approaches have been widely used. Recently, deep neural network (DNN) models have been widely presented in various fields, including biology. By capturing the composite relationship between sample features and target outcomes, a DNN model enables significant performance improvements in biological data mining analyses. In this paper, we constructed a DNN model, called scDAE for single-cell subtype identification combined with representative feature extraction using a multilayer denoising autoencoder (DAE). The feature sets were learned by the DAE and were further tuned by fully connected layers using a softmax classifier. The model was compared against four state-of-the-art cell subtype identification methods and two conventional machine learning algorithms. From multiple tests, scDAE significantly outperformed competing methods especially on data sets having a large number of cell subtypes and noises. Extracted cell features from the proposed model were clearly clustered with respect to subtype. The results of the experiments indicated that our proposed model is effective in identifying single-cell subtypes and molecular signatures representative of each cell subtype. scDAE is publicly available at https://github.com/cbi-bioinfo/scDAE.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.