CASD-NMR: critical assessment of automated structure determination by NMR

Rosato, Antonio; Bagaria, Anurag; Baker, David; Bardiaux, Benjamin; Cavalli, Andrea; Doreleijers, Jurgen F.; Giachetti, Andrea; Guerry, Paul; Güntert, Peter; Herrmann, Torsten; Huang, Yuanpeng Janet; Jonker, Hendrik R. A.; Mao, Binchen; Malliavin, Thérèse E.; Montelione, Gaetano T.; Nilges, Michaël; Raman, Srivatsan; Schot, Gijs van der; Vranken, Wim; Vuister, Geerten W.; Bonvin, Alexandre M. J. J.

doi:10.1038/nmeth0909-625

Cited by 73 publications

(59 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first one consisted of 65 NMR protein structure bundles obtained from the CASD-NMR project. 16 They are the results of NMR structure calculations for 16 single-domain proteins (65 structures) with 50-172 amino acid residues using different methods and programs. The second data set comprised 85 protein structures (all 85 unique sequences) selected from the CASP8 results as described in ''Methods section.''…”

Section: Data Sets and Validation Scoresmentioning

confidence: 99%

“…The remaining six experimental NMR data sets were provided by various groups as ''nonblind'' data sets. 16 The CASD-NMR data set comprised a total of 65 protein NMR structure bundles for the 16 proteins. Each bundle comprised 10-30 conformers (Supporting Information Table S1).…”

Section: Preparation Of the Casd-nmr And Casp Datasetsmentioning

confidence: 99%

“…Evaluating NMR protein structure determination methods is the aim of the critical assessment project critical assessment of protein structure determination by nuclear magnetic resonance (CASD-NMR) for NMR solution structures. 16 A similar, large-scale project for the critical assessment of structure prediction is CASP. 17 Evaluating the quality of NMR structures has always been a challenging task, and several erroneous structures were found.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Protein structure validation by generalized linear model root‐mean‐square deviation prediction

et al. 2012

Self Cite

View full text Add to dashboard Cite

Large-scale initiatives for obtaining spatial protein structures by experimental or computational means have accentuated the need for the critical assessment of protein structure determination and prediction methods. These include blind test projects such as the critical assessment of protein structure prediction (CASP) and the critical assessment of protein structure determination by nuclear magnetic resonance (CASD-NMR). An important aim is to establish structure validation criteria that can reliably assess the accuracy of a new protein structure. Various quality measures derived from the coordinates have been proposed. A universal structural quality assessment method should combine multiple individual scores in a meaningful way, which is challenging because of their different measurement units. Here, we present a method based on a generalized linear model (GLM) that combines diverse protein structure quality scores into a single quantity with intuitive meaning, namely the predicted coordinate root-mean-square deviation (RMSD) value between the present structure and the (unavailable) ''true'' structure (GLM-RMSD). For two sets of structural models from the CASD-NMR and CASP projects, this GLM-RMSD value was compared with the actual accuracy given by the RMSD value to the corresponding, experimentally determined reference structure from the Protein Data Bank (PDB). The correlation coefficients between actual (model vs. reference from PDB) and predicted (model vs. ''true'') heavy-atom RMSDs were 0.69 and 0.76, for the two datasets from CASD-NMR and CASP, respectively, which is considerably higher than those for the individual scores (20.24 to 0.68). The GLM-RMSD can thus predict the accuracy of protein structures more reliably than individual coordinate-based quality scores.

show abstract

Section: Data Sets and Validation Scoresmentioning

confidence: 99%

Section: Preparation Of the Casd-nmr And Casp Datasetsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Protein structure validation by generalized linear model root‐mean‐square deviation prediction

et al. 2012

Self Cite

View full text Add to dashboard Cite

show abstract

“…The CASD-NMR manifesto was published in Nature Methods in 2009 (Rosato et al 2009). In the first round of CASD-NMR, seven different teams provided structures for 10 targets.…”

mentioning

confidence: 99%

Automated protein structure determination by NMR

Rosato

Billeter

2015

J Biomol NMR

Self Cite

View full text Add to dashboard Cite

“…The ability to convert between formats efficiently is a key step for both meta-analysis, and for data exchange / pipelining. FormatConverter is widely used in the field of macromolecular NMR and in the eNMR project, in particular the CASD-NMR structure calculation competition [12] (see Supplementary Figure 8). The ability to consistently store large amounts of information from external data files is especially powerful; the FormatConverter and CCPN framework are used for data curation at the NMR deposition database BioMagResBank [13,14], and have provided data for the RECOORD structure recalculation project [15].…”

mentioning

confidence: 99%

MEMOPS: Data modelling and automatic code generation

Fogh

Boucher

Ionides

et al. 2010

Journal of Integrative Bioinformatics

View full text Add to dashboard Cite

SummaryIn recent years the amount of biological data has exploded to the point where much useful information can only be extracted by complex computational analyses. Such analyses are greatly facilitated by metadata standards, both in terms of the ability to compare data originating from different sources, and in terms of exchanging data in standard forms, e.g. when running processes on a distributed computing infrastructure. However, standards thrive on stability whereas science tends to constantly move, with new methods being developed and old ones modified. Therefore maintaining both metadata standards, and all the code that is required to make them useful, is a non-trivial problem. Memops is a framework that uses an abstract definition of the metadata (described in UML) to generate internal data structures and subroutine libraries for data access (application programming interfaces -APIs -currently in Python, C and Java) and data storage (in XML files or databases). For the individual project these libraries obviate the need for writing code for input parsing, validity checking or output. Memops also ensures that the code is always internally consistent, massively reducing the need for code reorganisation. Across a scientific domain a Memops-supported data model makes it easier to support complex standards that can capture all the data produced in a scientific area, share them among all programs in a complex software pipeline, and carry them forward to deposition in an archive. The principles behind the Memops generation code will be presented, along with example applications in Nuclear Magnetic Resonance (NMR) spectroscopy and structural biology. IntroductionIn recent times, the combination of digitization, high-throughput approaches and modern computing techniques has revolutionized the relationship between scientists and data in terms of size and access. These advances present great opportunities but also create considerable problems. Most data now exists in electronic form at some point in its life, and it is therefore extremely important that data can be passed seamlessly between the many different programs that might be used to process and analyse it. If all scientific software was always written to some common data standard then this would not be difficult. In practice, however, this is a non-trivial problem. Science is primarily driven by the need to generate results rather than conform to standards, even if such standards existed and could be agreed upon in constantly evolving fields. The need for standards remains, however. As high throughput methodologies have proliferated, and networks have made it increasingly simple to move data to wherever it is needed, there has been increased interest in defining data standards across a large number of fields where there are immense amounts of data that need to be organised and exploited. Recent reviews by Brazma et al.,[1] on data standards and by Swertz and Jansen [2] on software infrastructure give a good account of both current efforts and the underlying...

show abstract

CASD-NMR: critical assessment of automated structure determination by NMR

Cited by 73 publications

References 4 publications

Protein structure validation by generalized linear model root‐mean‐square deviation prediction

Protein structure validation by generalized linear model root‐mean‐square deviation prediction

Automated protein structure determination by NMR

MEMOPS: Data modelling and automatic code generation

Contact Info

Product

Resources

About