Background: PET hydrolase from Ideonella sakaiensis might provide a response for PET accumulation in the environment. In this project some previously studied mutations were implemented and their performance was evaluated via computational methods with tools such as Modeller, HADDOCK, PyMOL and Gromacs. One possible mutation that could lead to improved catalytic activity was proposed. Results: PET hydrolase DM S209F W130H and I179 provide interesting binding results with studied ligands, however a solution that combines both mutations does not seem viable, since the binding cleft becomes occluded. Following the same rationale, the triple mutant S209F W130H I179Q is proposed but instead leaves space in the binding cleft for ligand to enter and might bond with the oxygen at the ester group. The experiments conducted with triple mutant S209F W130H I179Q failed to beat HADDOCK score for DM, however its experimental results could still increase PET degradation. Results from surface charge may indicate an increase in stability and binding affinity for the protein. Conclusions: Among models implemented, DM S209F W130H seems the best model studied regarding BHET or PET binding. Despite Protein Engineering is a complex process, computational tools might provide a way of studying binding sites of hypothetical proteins.
Background: Gene expression regulates several complex traits observed. In this study, datasets comprising of transcriptome information and clinical traits regarding fat composition and vitals were analyzed via several statistical methods in order to find relations between genes and clinical outcomes. Results: Biological big data is diverse and numerous, which makes for a complex case study and difficulties to stablish a metric. Histological data with semi-quantitative scores proved unreliable to correlate with other vitals, such as cholesterol composition, which complicates prediction of clinical outcomes. A composition of vitals, turned out to be a better variable for regression and factors for gene analysis. Several genes were found to be statistically significant after statistical analysis by ANOVA regarding the progressive categories of the preferred clinical variable. Conclusions: ANOVA is proposed as a method for genetic information retrieval in order to extract biological meaning from RNA seq or microarray data, accounting for multiple classes of target variables. It Provides a reliable statistical method to associate genes or clusters of genes with particular traits.
The worldwide surge of multiresistant microbial strains has propelled the search for alternative treatment options. The study of Protein-Protein Interactions (PPIs) has been a cornerstone in the clarification of complex physiological and pathogenic processes, thus being a priority for the identification of vital components and mechanisms in pathogens. Despite the advances of laboratory techniques, computational models allow the screening of protein interactions between entire proteomes in a fast and inexpensive manner. Here, we present a supervised machine learning model for the prediction of PPIs based on the protein sequence. We cluster amino acids regarding their physicochemical properties, and use the discrete cosine transform to represent protein sequences. A mesh of classifiers was constructed to create hyper-specialized classifiers dedicated to the most relevant pairs of molecular function annotations from Gene Ontology. Based on an exhaustive evaluation that includes datasets with different configurations, cross-validation and out-of-sampling validation, the obtained results outscore the state-of-the-art for sequence-based methods. For the final mesh model using SVM with RBF, a consistent average AUC of 0.84 was attained.
Diverse methods have been proposed for protein secondary structure prediction. However, such task still presents a challenge in bioinformatics. In this article various of these methods are implemented and analysed. First, a baseline using Support Vector Machine. Then a convolutional neural network (CNN), a Long Short-Term Memory (LSTM) and a strategy of Ensembling both of these methods. Lastly, a novel technique Secundary Structure of Physicochemical Clustered Proteins (SSPCP) is proposed, which combines multiple CNNs trained accordingly to a protein feature clustering and combined using a neural network. The rationale behind SSPCP is that amino acids from proteins which have similar physicochemical characteristics should have the same secondary structure prediction for similar amino acids, but amino acids from differing proteins might have different structures. All of these methods use as features PSSM matrices extracted from PSIBLAST. For performance evaluation, 25pdb dataset was split into training and validation and the same subsets were used on all these methods achieving the Q3 score of CNN: 70.11%, LSTM: 69.25%, Ensemble: 70.71%, SSPCP: 70.91%. The experimental results show that the features extracted from clustering of physicochemical properties of proteins seem to improve the accuracy of highly specific CNN models for accurate protein secondary structure prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.