Summary RNA editing, a post-transcriptional process, allows the diversification of proteomes beyond the genomic blueprint; however it is infrequently used among animals. Recent reports suggesting increased levels of RNA editing in squids thus raise the question of their nature and effects in these organisms. We here show that RNA editing is particularly common in behaviorally sophisticated coleoid cephalopods, with tens of thousands of evolutionarily conserved sites. Editing is enriched in the nervous system affecting molecules pertinent for excitability and neuronal morphology. The genomic sequence flanking editing sites is highly conserved, suggesting that the process confers a selective advantage. Due to the large number of sites, the surrounding conservation greatly reduces the number of mutations and genomic polymorphisms in protein coding regions. This trade-off between genome evolution and transcriptome plasticity highlights the importance of RNA recoding as a strategy for diversifying proteins, particularly those associated with neural function.
A new approach is introduced for analyzing and ultimately predicting protein structures, defined at the level of C alpha coordinates. We analyze hexamers (oligopeptides of six amino acid residues) and show that their structure tends to concentrate in specific clusters rather than vary continuously. Thus, we can use a limited set of standard structural building blocks taken from these clusters as representatives of the repertoire of observed hexamers. We demonstrate that protein structures can be approximated by concatenating such building blocks. We have identified about 100 building blocks by applying clustering algorithms, and have shown that they can "replace" about 76% of all hexamers in well-refined known proteins with an error of less than 1 A, and can be joined together to cover 99% of the residues. After replacing each hexamer by a standard building block with similar conformation, we can approximately reconstruct the actual structure by smoothly joining the overlapping building blocks into a full protein. The reconstructed structures show, in most cases, high resemblance to the original structure, although using a limited number of building blocks and local criteria of concatenating them is not likely to produce a very precise global match. Since these building blocks reflect, in many cases, some sequence dependency, it may be possible to use the results of this study as a basis for a protein structure prediction procedure.
A B S T R A C T PurposeAllogeneic hematopoietic stem-cell transplantation (HSCT) is potentially curative for acute leukemia (AL), but carries considerable risk. Machine learning algorithms, which are part of the data mining (DM) approach, may serve for transplantation-related mortality risk prediction. Patients and MethodsThis work is a retrospective DM study on a cohort of 28,236 adult HSCT recipients from the AL registry of the European Group for Blood and Marrow Transplantation. The primary objective was prediction of overall mortality (OM) at 100 days after HSCT. Secondary objectives were estimation of nonrelapse mortality, leukemia-free survival, and overall survival at 2 years. Donor, recipient, and procedural characteristics were analyzed. The alternating decision tree machine learning algorithm was applied for model development on 70% of the data set and validated on the remaining data. ResultsOM prevalence at day 100 was 13.9% (n ϭ 3,936). Of the 20 variables considered, 10 were selected by the model for OM prediction, and several interactions were discovered. By using a logistic transformation function, the crude score was transformed into individual probabilities for 100-day OM (range, 3% to 68%). The model's discrimination for the primary objective performed better than the European Group for Blood and Marrow Transplantation score (area under the receiver operating characteristics curve, 0.701 v 0.646; P Ͻ .001). Calibration was excellent. Scores assigned were also predictive of secondary objectives. ConclusionThe alternating decision tree model provides a robust tool for risk evaluation of patients with AL before HSCT, and is available online (http://bioinfo.lnx.biu.ac.il/ϳbondi/web1.html). It is presented as a continuous probabilistic score for the prediction of day 100 OM, extending prediction to 2 years. The DM method has proved useful for clinical prediction in HSCT.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.