Mitochondrial mutations are increasingly recognised as informative endogenous genetic markers that can be used to reconstruct cellular clonal structure using single-cell RNA or DNA sequencing data. However, identifying informative mtDNA variants in noisy and sparse single-cell sequencing data is still challenging with few computation methods available. Here we present an open source computational tool MQuad that accurately calls clonally informative mtDNA variants in a population of single cells, and an analysis suite for complete clonality inference, based on single cell RNA, DNA or ATAC sequencing data. Through a variety of simulated and experimental single cell sequencing data, we showed that MQuad can identify mitochondrial variants with both high sensitivity and specificity, outperforming existing methods by a large extent. Furthermore, we demonstrate its wide applicability in different single cell sequencing protocols, particularly in complementing single-nucleotide and copy-number variations to extract finer clonal resolution.
BackgroundMessenger RNA polyadenylation is an essential step for the maturation of most eukaryotic mRNAs. Accurate determination of poly(A) sites helps define the 3’-ends of genes, which is important for genome annotation and gene function research. Genomic studies have revealed the presence of poly(A) sites in intergenic regions, which may be attributed to 3’-UTR extensions and novel transcript units. However, there is no systematically evaluation of intergenic poly(A) sites in plants.ResultsApproximately 16,000 intergenic poly(A) site clusters (IPAC) in Arabidopsis thaliana were discovered and evaluated at the whole genome level. Based on the distributions of distance from IPACs to nearby sense and antisense genes, these IPACs were classified into three categories. About 70 % of them were from previously unannotated 3’-UTR extensions to known genes, which would extend 6985 transcripts of TAIR10 genome annotation beyond their 3’-ends, with a mean extension of 134 nucleotides. 1317 IPACs were originated from novel intergenic transcripts, 37 of which were likely to be associated with protein coding transcripts. 2957 IPACs corresponded to antisense transcripts for genes on the reverse strand, which might affect 2265 protein coding genes and 39 non-protein-coding genes, including long non-coding RNA genes. The rest of IPACs could be originated from transcriptional read-through or gene mis-annotations.ConclusionsThe identified IPACs corresponding to novel transcripts, 3’-UTR extensions, and antisense transcription should be incorporated into current Arabidopsis genome annotation. Comprehensive characterization of IPACs from this study provides insights of alternative polyadenylation and antisense transcription in plants.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-1691-1) contains supplementary material, which is available to authorized users.
Motivation Protein post-translational modifications (PTMs) regulate a wide range of cellular protein functions. Many PTM sites from the same (intra) or different (inter) proteins often cooperate with each other to perform a function, which is defined as PTM cross-talk. PTM cross-talk within proteins attracted great attentions in the past a few years. However, the inter-protein PTM cross-talk is largely under studied due to its large protein pair space and lack of a gold standard dataset, even though the PTM interplay between proteins is a key element in cell signaling and regulatory networks. Results In this study, 199 inter-protein PTM cross-talk pairs in 82 pairs of human proteins were collected from literature, which to our knowledge is the first effort in compiling such dataset. By comparing with background PTM pairs from the same protein pairs, we found that inter-protein cross-talk PTM pairs have higher sequence co-evolution at both PTM residue and motif levels. Also, we found that cross-talk PTMs have higher co-modification across multiple species and 88 human tissues or conditions. Furthermore, we showed that these features are predictive for PTM cross-talk between proteins, and applied a random forest model to integrate these features with achieving an area under the receiver operating characteristic curve of 0.81 in 10-fold cross-validation, prevailing over using any single feature alone. Therefore, this method would be a valuable tool to identify inter-protein PTM cross-talk at proteome-wide scale. Availability and implementation A web server for prioritization of both intra- and inter-protein PTM cross-talk candidates is at http://bioinfo.bjmu.edu.cn/ptm-x/. Python code for local computer is also freely available at https://github.com/huangyh09/PTM-X. Supplementary information Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.