IntFOLD is an independent web server that integrates our leading methods for structure and function prediction. The server provides a simple unified interface that aims to make complex protein modelling data more accessible to life scientists. The server web interface is designed to be intuitive and integrates a complex set of quantitative data, so that 3D modelling results can be viewed on a single page and interpreted by non-expert modellers at a glance. The only required input to the server is an amino acid sequence for the target protein. Here we describe major performance and user interface updates to the server, which comprises an integrated pipeline of methods for: tertiary structure prediction, global and local 3D model quality assessment, disorder prediction, structural domain prediction, function prediction and modelling of protein-ligand interactions. The server has been independently validated during numerous CASP (Critical Assessment of Techniques for Protein Structure Prediction) experiments, as well as being continuously evaluated by the CAMEO (Continuous Automated Model Evaluation) project. The IntFOLD server is available at: http://www.reading.ac.uk/bioinf/IntFOLD/
Our aim in CASP12 was to improve our Template-Based Modeling (TBM) methods through better model selection, accuracy self-estimate (ASE) scores and refinement. To meet this aim, we developed two new automated methods, which we used to score, rank, and improve upon the provided server models. Firstly, the ModFOLD6_rank method, for improved global Quality Assessment (QA), model ranking and the detection of local errors. Secondly, the ReFOLD method for fixing errors through iterative QA guided refinement. For our automated predictions we developed the IntFOLD4-TS protocol, which integrates the ModFOLD6_rank method for scoring the multiple-template models that were generated using a number of alternative sequence-structure alignments. Overall, our selection of top models and ASE scores using ModFOLD6_rank was an improvement on our previous approaches. In addition, it was worthwhile attempting to repair the detected errors in the top selected models using ReFOLD, which gave us an overall gain in performance. According to the assessors' formula, the IntFOLD4 server ranked 3rd/5th (average Z-score > 0.0/-2.0) on the server only targets, and our manual predictions (McGuffin group) ranked 1st/2nd (average Z-score > -2.0/0.0) compared to all other groups.
Advances in omics technologies have led to the discovery of genetic markers, or single nucleotide polymorphisms (SNPs), that are associated with particular diseases or complex traits. Although there have been significant improvements in the approaches used to analyse associations of SNPs with disease, further optimised and rapid techniques are needed to keep up with the rate of SNP discovery, which has exacerbated the ‘missing heritability’ problem. Here, we have devised a novel, integrated, heuristic-based, hybrid analytical computational pipeline, for rapidly detecting novel or key genetic variants that are associated with diseases or complex traits. Our pipeline is particularly useful in genetic association studies where the genotyped SNP data are highly dimensional, and the complex trait phenotype involved is continuous. In particular, the pipeline is more efficient for investigating small sets of genotyped SNPs defined in high dimensional spaces that may be associated with continuous phenotypes, rather than for the investigation of whole genome variants. The pipeline, which employs a consensus approach based on the random forest, was able to rapidly identify previously unseen key SNPs, that are significantly associated with the platelet response phenotype, which was used as our complex trait case study. Several of these SNPs, such as rs6141803 of COMMD7 and rs41316468 in PKT2B, have independently confirmed associations with cardiovascular diseases (CVDs) according to other unrelated studies, suggesting that our pipeline is robust in identifying key genetic variants. Our new pipeline provides an important step towards addressing the problem of ‘missing heritability’ through enhanced detection of key genetic variants (SNPs) that are associated with continuous complex traits/disease phenotypes.
Platelet activation involves different signalling pathways in the underlying thrombus formation process. These pathways are the result of platelet responses to agonists' activation. Previous analyses involving four pathways (P-selectin in response to adenosine diphosphate (ADP), P-selectin in response to cross-linked polypeptide (CRP), fibrinogen binding stimulated with ADP and fibrinogen binding stimulated with CRP) revealed genomic associations regulating these pathways. These analyses were performed on single nucleotide polymorphisms data (SNPs) in which the underlying characteristic of these data normally contains small number of observations (N) and large number of variables or features ( p). However, the methodologies used in analysing these genomic data involved linear models using stepwise regression. We argue that this approach deemed to be sub optimal for linear modelling analyses. We propose an alternative approach using more rob ust methods such as ridge regression and LASSO that would produce previously unknown novel SNPs describing their effects on the four pathways in the available large pool of SNPs. Methodology The genome-wide association (GWA) data containing 1554 single nucleotide polymorphisms (SNPs) for 512 individuals describing their effects on four signalling pathways were previously analysed statistically using stepwise regression [1] which is sub optimal[2]. We statistically re-analysed these data using both stepwise regression and shrinkage approaches. Results Several of the SNPs and their associated genes identified using our new stepwise approach were not previously selected, though are now found to be significant. Conclusion We propose shrinkage approach for linear models using ridge regression and LASSO for statistical analysis of genomic data with large p and small N.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.