The information criterion of minimum message length (MML) provides a powerful statistical framework for inductive reasoning from observed data. We apply MML to the problem of protein sequence comparison using finite state models with Dirichlet distributions. The resulting framework allows us to supersede the ad hoc cost functions commonly used in the field, by systematically addressing the problem of arbitrariness in alignment parameters, and the disconnect between substitution scores and gap costs. Furthermore, our framework enables the generation of marginal probability landscapes over all possible alignment hypotheses, with potential to facilitate the users to simultaneously rationalize and assess competing alignment relationships between protein sequences, beyond simply reporting a single (best) alignment. We demonstrate the performance of our program on benchmarks containing distantly related protein sequences. Availability and implementation The open-source program supporting this work is available from: http://lcb.infotech.monash.edu.au/seqmmligner. Supplementary information Supplementary data are available at Bioinformatics online.
The COVID-19 pandemic is an ongoing global health threat, yet our understanding of the cellular disease dynamics remains limited. In our unique COVID-19 human challenge study we used single cell genomics of nasopharyngeal swabs and blood to temporally resolve abortive, transient and sustained infections in 16 seronegative individuals challenged with preAlpha-SARS-CoV-2. Our analyses revealed rapid changes in cell type proportions and dozens of highly dynamic cellular response states in epithelial and immune cells associated with specific timepoints or infection status. We observed that the interferon response in blood precedes the nasopharynx, and that nasopharyngeal immune infiltration occurred early in transient but later in sustained infection, and thus correlated with preventing sustained infection. Ciliated cells showed an acute response phase, upregulated MHC class II while infected, and were most permissive for viral replication, whilst nasal T cells and macrophages were infected non-productively. We resolve 54 T cell states, including acutely activated T cells that clonally expanded while carrying convergent SARS-CoV-2 motifs. Our novel computational pipeline (Cell2TCR) identifies activated antigen-responding clonotype groups and motifs in any dataset. Together, we show that our detailed time series data (covid19cellatlas.org) can serve as a 'Rosetta stone' for the epithelial and immune cell responses, and reveals early dynamic responses associated with protection from infection.
Motivation Alignments are correspondences between sequences. How reliable are alignments of amino acid sequences of proteins, and what inferences about protein relationships can be drawn? Using techniques not previously applied to these questions, by weighting every possible sequence alignment by its posterior probability we derive a formal mathematical expectation, and develop an efficient algorithm for computation of the distance between alternative alignments allowing quantitative comparisons of sequence-based alignments with corresponding reference structure alignments. Results By analyzing the sequences and structures of 1 million protein domain pairs, we report the variation of the expected distance between sequence-based and structure-based alignments, as a function of (Markov time of) sequence divergence. Our results clearly demarcate the ‘daylight’, ‘twilight’ and ‘midnight’ zones for interpreting residue–residue correspondences from sequence information alone. Supplementary information Supplementary data are available at Bioinformatics online.
Single cell data analysis can infer dynamic changes in cell populations, for example across time, space or in response to perturbation. To compare these dynamics between two conditions, trajectory alignment via dynamic programming (DP) optimization is frequently used, but is limited by assumptions such as a definite existence of a match. Here we describe Genes2Genes, a Bayesian information-theoretic DP framework for aligning single-cell trajectories. Genes2Genes overcomes current limitations and is able to capture sequential matches and mismatches between a reference and a query at single gene resolution, highlighting distinct clusters of genes with varying patterns of gene expression dynamics. Across both real life and simulated datasets, Genes2Genes accurately captured different alignment patterns, and revealed that T cells differentiated in vitro matched to an immature in vivo state while lacking the final TNFɑ signaling. This use case demonstrates that precise trajectory alignment can pinpoint divergence from the in vivo system, thus providing an opportunity to optimize in vitro culture conditions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.