Exploring the mechanisms of maintaining microbial community structure is important to understand biofilm development or microbiota dysbiosis. In this paper, we propose a functional gene-based composition prediction (FCP) model to predict the population structure composition within a microbial community. The model predicts the community composition well in both a low-complexity community as acid mine drainage (AMD) microbiota, and a complex community as human gut microbiota . Furthermore, we define community structure shaping (CSS) genes as functional genes crucial for shaping the microbial community. We have identified CSS genes in AMD and human gut microbiota samples with FCP model and find that CSS genes change with the conditions. Compared to essential genes for microbes, CSS genes are significantly enriched in the genes involved in mobile genetic elements, cell motility, and defense mechanisms, indicating that the functions of CSS genes are focused on communication and strategies in response to the environment factors. We further find that it is the minority, rather than the majority, which contributes to maintaining community structure. Compared to health control samples, we find that some functional genes associated with metabolism of amino acids, nucleotides, and lipopolysaccharide are more likely to be CSS genes in the disease group. CSS genes may help us to understand critical cellular processes and be useful in seeking addable gene circuitries to maintain artificial self-sustainable communities. Our study suggests that functional genes are important to the assembly of microbial communities.
The fruit fly, Drosophila melanogaster, has been used as a model organism for the molecular and genetic dissection of sleeping behaviors. However, most previous studies were based on qualitative or semi-quantitative characterizations. Here we quantified sleep in flies. We set up an assay to continuously track the activity of flies using infrared camera, which monitored the movement of tens of flies simultaneously with high spatial and temporal resolution. We obtained accurate statistics regarding the rest and sleep patterns of single flies. Analysis of our data has revealed a general pattern of rest and sleep: the rest statistics obeyed a power law distribution and the sleep statistics obeyed an exponential distribution. Thus, a resting fly would start to move again with a probability that decreased with the time it has rested, whereas a sleeping fly would wake up with a probability independent of how long it had slept. Resting transits to sleeping at time scales of minutes. Our method allows quantitative investigations of resting and sleeping behaviors and our results provide insights for mechanisms of falling into and waking up from sleep.
Single-cell RNA-seq (scRNA-seq) is quite prevalent in studying transcriptomes, but it suffers from excessive zeros, some of which are true, but others are false. False zeros, which can be seen as missing data, obstruct the downstream analysis of single-cell RNA-seq data. How to distinguish true zeros from false ones is the key point of this problem. Here, we propose sparsity-penalized stacked denoising autoencoders (scSDAEs) to impute scRNA-seq data. scSDAEs adopt stacked denoising autoencoders with a sparsity penalty, as well as a layer-wise pretraining procedure to improve model fitting. scSDAEs can capture nonlinear relationships among the data and incorporate information about the observed zeros. We tested the imputation efficiency of scSDAEs on recovering the true values of gene expression and helping downstream analysis. First, we show that scSDAE can recover the true values and the sample–sample correlations of bulk sequencing data with simulated noise. Next, we demonstrate that scSDAEs accurately impute RNA mixture dataset with different dilutions, spike-in RNA concentrations affected by technical zeros, and improves the consistency of RNA and protein levels in CITE-seq data. Finally, we show that scSDAEs can help downstream clustering analysis. In this study, we develop a deep learning-based method, scSDAE, to impute single-cell RNA-seq affected by technical zeros. Furthermore, we show that scSDAEs can recover the true values, to some extent, and help downstream analysis.
Convolutional neural network (CNN) based methods have outperformed conventional machine learning methods in predicting the binding preference of DNA-protein binding. Although studies in the past have shown that more convolutional kernels help to achieve better performance, visualization of the model can be obscured by the use of many kernels, resulting in overfitting and reduced interpretation because the number of motifs in true models is limited. Therefore, we aim to arrive at high performance, but with limited kernel numbers, in CNN-based models for motif inference. We herein present Deepprune, a novel deep learning framework, which prunes the weights in the dense layer and fine-tunes iteratively. These two steps enable the training of CNN-based models with limited kernel numbers, allowing easy interpretation of the learned model. We demonstrate that Deepprune significantly improves motif inference performance for the simulated datasets. Furthermore, we show that Deepprune outperforms the baseline with limited kernel numbers when inferring DNA-binding sites from ChIP-seq data.
2Convolutional neural network (CNN) based methods have outperformed conventional machine 3 learning methods in predicting the binding preference of DNA-protein binding. Although studies 4 in the past have shown that more convolutional kernels help to achieve better performance, 5 visualization of the model can be obscured by the use of many kernels, resulting in overfitting 6 and reduced interpretation because the number of motifs in true models is limited. Therefore, 7 we aim to arrive at high performance, but with limited kernel numbers, in CNN-based models for 8 motif inference. 9 We herein present Deepprune, a novel deep learning framework, which prunes the weights 10 in the dense layer and fine-tunes iteratively. These two steps enable the training of CNN-based 11 models with limited kernel numbers, allowing easy interpretation of the learned model. We 12 demonstrate that Deepprune significantly improves motif inference performance for the simulated 13 datasets. Furthermore, we show that Deepprune outperforms the baseline with limited kernel 14 numbers when inferring DNA-binding sites from ChIP-seq data. 15 Keywords: Deep neural networks, Motif inference, Network pruning 16 BACKGROUND Determining how proteins interact with DNA to regulate gene expression is essential for fully understanding 17 many biological processes and disease states. Many DNA binding proteins have affinity for specific DNA 18 binding sites. ChIP-seq combines chromatin immunoprecipitation(ChIP) with massively parallel DNA 19 sequencing to identify DNA binding sites of DNA-associated proteins(Zhang et al., 2008). However, 20 DNA sequences directly obtained by experiments typically contain noise and bias. Consequently, many 21 computational methods have been developed to predict protein-DNA binding, including conventional 22 statistical methods (Badis et al., 2009; Ghandi et al., 2016) and deep learning-based methods (Alipanahi 23 et al., 2015; Zhou and Troyanskaya, 2015; Zeng et al., 2016). Convolutional neural networks (CNNs) have 24 attracted attention for identifying protein-DNA binding motifs in many studies.(Zhou and Troyanskaya, 25 1 Luo et al. ;Alipanahi et al., 2015). Genomic sequences are first encoded in one-hot format; then, a 1-D 26 convolution operation with 4 channels is performed on them. For conventional machine learning methods, 27 the sequence specificities of a protein are often characterized by position weight matrices (PWM)(Stormo, 28 2000). PWM has a direct connection to CNN-based model since the log-likelihood of the resulting PWM 29 of each DNA sequence is exactly the sum of a constant and the convolution of the original kernel on 30 the same sequence from the view of probability model (Ding et al., 2018). Zeng et al.(Zeng et al., 2016) 31 experimented with different structures and hyperparameters and showed that the convolutional layers with 32 more kernels could obtain better performance. They also showed that training models with gradient descent 33 methods is sensitive to weight initializati...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.