The origin of the machinery that realizes protein biosynthesis in all organisms is still unclear. One key component of this machinery are aminoacyl tRNA synthetases (aaRS), which ligate tRNAs to amino acids while consuming ATP. Sequence analyses revealed that these enzymes can be divided into two complementary classes. Both classes differ significantly on a sequence and structural level, feature different reaction mechanisms, and occur in diverse oligomerization states. The one unifying aspect of both classes is their function of binding ATP. We identified Backbone Brackets and Arginine Tweezers as most compact ATP binding motifs characteristic for each Class. Geometric analysis shows a structural rearrangement of the Backbone Brackets upon ATP binding, indicating a general mechanism of all Class I structures. Regarding the origin of aaRS, the Rodin-Ohno hypothesis states that the peculiar nature of the two aaRS classes is the result of their primordial forms, called Protozymes, being encoded on opposite strands of the same gene. Backbone Brackets and Arginine Tweezers were traced back to the proposed Protozymes and their more efficient successors, the Urzymes. Both structural motifs can be observed as pairs of residues in contemporary structures and it seems that the time of their addition, indicated by their placement in the ancient aaRS, coincides with the evolutionary trace of Proto- and Urzymes.
Storage and directed transfer of information is the key requirement for the development of life. Yet any information stored on our genes is useless without its correct interpretation. The genetic code defines the rule set to decode this information. Aminoacyl-tRNA synthetases are at the heart of this process. We extensively characterize how these enzymes distinguish all natural amino acids based on the computational analysis of crystallographic structure data. The results of this meta-analysis show that the correct read-out of genetic information is a delicate interplay between the composition of the binding site, non-covalent interactions, error correction mechanisms, and steric effects. One of the most profound open questions in biology is how the genetic code was established. While proteins are encoded by nucleic acid blueprints, decoding this information in turn requires proteins. The emergence of this self-referencing system poses a chicken-or-egg dilemma and its origin is still heavily debated 1,2. Aminoacyl-tRNA synthetases (aaRSs) implement the correct assignment of amino acids to their codons and are thus inherently connected to the emergence of genetic coding. These enzymes link tRNA molecules with their amino acid cargo and are consequently vital for protein biosynthesis. Beside the correct recognition of tRNA features 3 , highly specific non-covalent interactions in the binding sites of aaRSs are required to correctly detect the designated amino acid 4-7 and to prevent errors in biosynthesis 5,8. The minimization of such errors represents the utmost barrier for the development of biological complexity 9 and accurate specification of aaRS binding sites is proposed to be one of the major determinants for the closure of the genetic code 10. Beside binding side features, recognition fidelity is controlled by the ratio of concentrations of aaRSs and cognate tRNA molecules 11 and may involve spatial secondary structures motifs in addition to side chain configurations 12,13. Evolution. The evolutionary origin of aaRSs is hard to track. Phylogenetic analyses of aaRS sequences show that they do not follow the standard model of life 14 ; the development of aaRSs was nearly complete before the Last Universal Common Ancestor (LUCA) 15,16. Their complex evolutionary history included horizontal gene transfer, fusion, duplication, and recombination events 14,17-21. Sequence analyses 22 and subsequent structure investigations 23,24 revealed that aaRSs can be divided into two distinct classes (Class I and Class II) that share no similarities at sequence or structure level. Each of the classes is responsible for 10 of the 20 proteinogenic amino acids and can be further grouped into subclasses 15. One exception to this class separation rule is lysyl-tRNA synthetase (LysRS), where euryarchaeal genomes were shown to contain a Class I form 25 instead of the standard Class II form. Most eukaryotic genomes contain the complete set of 20 aaRSs. However, some species lack certain aaRS-encoding genes and compensate for this by ...
BackgroundMachine learning strategies are prominent tools for data analysis. Especially in life sciences, they have become increasingly important to handle the growing datasets collected by the scientific community. Meanwhile, algorithms improve in performance, but also gain complexity, and tend to neglect interpretability and comprehensiveness of the resulting models.ResultsGeneralized Matrix Learning Vector Quantization (GMLVQ) is a supervised, prototype-based machine learning method and provides comprehensive visualization capabilities not present in other classifiers which allow for a fine-grained interpretation of the data. In contrast to commonly used machine learning strategies, GMLVQ is well-suited for imbalanced classification problems which are frequent in life sciences. We present a Weka plug-in implementing GMLVQ. The feasibility of GMLVQ is demonstrated on a dataset of Early Folding Residues (EFR) that have been shown to initiate and guide the protein folding process. Using 27 features, an area under the receiver operating characteristic of 76.6% was achieved which is comparable to other state-of-the-art classifiers. The obtained model is accessible at https://biosciences.hs-mittweida.de/efpred/.ConclusionsThe application on EFR prediction demonstrates how an easy interpretation of classification models can promote the comprehension of biological mechanisms. The results shed light on the special features of EFR which were reported as most influential for the classification: EFR are embedded in ordered secondary structure elements and they participate in networks of hydrophobic residues. Visualization capabilities of GMLVQ are presented as we demonstrate how to interpret the results.Electronic supplementary materialThe online version of this article (10.1186/s13040-018-0188-2) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.