While mRNA stability has been demonstrated to control rates of translation, generating both global and local synonymous codon biases in many unicellular organisms, this explanation cannot adequately explain why codon bias strongly tracks neighboring intergene GC content; suggesting that structural dynamics of DNA might also influence codon choice. Because minor groove width is highly governed by 3-base periodicity in GC, the existence of triplet-based codons might imply a functional role for the optimization of local DNA molecular dynamics via GC content at synonymous sites (≈GC3). We confirm a strong association between GC3-related intrinsic DNA flexibility and codon bias across 24 different prokaryotic multiple whole-genome alignments. We develop a novel test of natural selection targeting synonymous sites and demonstrate that GC3-related DNA backbone dynamics have been subject to moderate selective pressure, perhaps contributing to our observation that many genes possess extreme DNA backbone dynamics for their given protein space. This dual function of codons may impose universal functional constraints affecting the evolution of synonymous and non-synonymous sites. We propose that synonymous sites may have evolved as an ‘accessory’ during an early expansion of a primordial genetic code, allowing for multiplexed protein coding and structural dynamic information within the same molecular context.
Building a data-driven model to localize the origin of ventricular activation from 12-lead electrocardiograms (ECG) requires addressing the challenge of large anatomical and physiological variations across individuals. The alternative of a patient-specific model is, however, difficult to implement in clinical practice because training data must be obtained through invasive procedures.Here, we present a novel approach that overcomes this problem of the scarcity of clinical data by transferring the knowledge from a large set of patient-specific simulation data while utilizing domain adaptation to address the discrepancy between simulation and clinical data. The method that we have developed quantifies non-uniformly distributed simulation errors, which are then incorporated into the process of domain adaptation in the context of both classification and regression. This yields a quantitative model that, with the addition of 12-lead ECG data from each patient, provides progressively improved patient-specific localizations of the origin of ventricular activation. We evaluated the performance of the presented method in localizing 75 pacing sites on three in-vivo premature ventricular contraction (PVC) patients. We found that the presented model showed an improvement in localization accuracy relative to a model trained on clinical ECG data alone or a model trained on combined simulation and clinical data without considering domain shift. Further, we demonstrated the ability of the presented model to improve the real-time prediction of the origin of ventricular activation with each added clinical ECG data, progressively guiding the clinician towards the target site.
A long-held presupposition in the field of bioinformatics holds that genetic, and now even epigenetic 'information' can be abstracted from the physicochemical details of the macromolecular polymers in which it resides. It is perhaps rather ironic that this basic conjecture originated upon the first observations of DNA structure itself. This static model of DNA led very quickly to the conclusion that only the nucleobase sequence itself is rich enough in molecular complexity to replicate a complex biology. This idea has been pervasive throughout genomic science, higher education and popular culture ever since; to the point that most of us would accept it unquestioningly as fact. What is more alarming is that this conjecture is driving a significant portion of the technological development in modern genomics towards methods strongly rooted in DNA sequencing, thereby reducing a dynamic multi-dimensional biology into single-dimensional forms of data. Evidence countering this central tenet of bioinformatics has been quietly mounting over many decades, prompting some to propose that the genome must be studied from the perspective of its molecular reality, rather than as a body of information to be represented symbolically. Here, we explore the epistemological boundary between bioinformatics and molecular biology, and warn against an 'overtly' bioinformatic perspective. We review a selection of new bioinformatic methods that move beyond sequence-based approaches to include consideration of databased three dimensional structures. However, we also note that these hybrid methods still ignore the most important element of gene function when attempting to improve outcomes; the fourth dimension of molecular dynamics over time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.