50Circular dichroism spectroscopy is a highly sensitive, but low-resolution technique to study 51 the structure of proteins. Combed with molecular modelling and other complementary 52 techniques, CD spectroscopy can also provide essential information at higher resolution. To 53 this aim, we introduce a new computational method to calculate the electronic circular 54 dichroism spectra of proteins from a three dimensional-model structure or structural 55 ensemble. The method determines the CD spectrum from the average secondary structure 56 composition of the protein using a pre-calculated set of basis spectra. We derived several 57 basis spectrum sets obtained from the experimental CD spectra and secondary structure 58 information of 71 reference proteins and tested the prediction accuracy of these basis 59 spectrum sets through cross-validation. Furthermore, we investigated how prediction 60 accuracy is affected by contributions from amino acid side chain groups and protein 61 flexibility, potential experimental errors of the reference protein spectra, as well as the choice 62 of the secondary structure classification algorithm and the number of basis spectra. We 63 compared the predictive power of our method to previous spectrum prediction algorithms 64 such as DichroCalc and PDB2CD and found that SESCA predicts the CD spectra with up 65 to 50% smaller deviation. Our results indicate that SESCA basis sets are robust to 66 experimental error in the reference spectra, and the choice of the secondary structure 67 classification algorithm. For over 80% of the globular reference proteins, SESCA basis sets 68 could accurately predict the experimental spectrum solely from their secondary structure 69 composition. To improve SESCA predictions for the remaining proteins, we applied 70 corrections to account for intensity normalization, contributions from the amino side chains, 71 and conformational flexibility. For globular proteins only intensity scaling improved the 72 prediction accuracy significantly, but our models indicate that side chain contributions and 73 .
CC-BY-NC-ND 4.0 International license not peer-reviewed) is the author/funder. It is made available under aThe copyright holder for this preprint (which was . http://dx.doi.org/10.1101/279752 doi: bioRxiv preprint first posted online Mar. 9, 2018; 3 structural flexibility are pivotal for the prediction of shorter peptides and intrinsically 74 disordered proteins. 75Author summary 76 Proteins are biomolecules that perform almost all of active task in living organisms, and how 77 they perform these task is defined by their structure. By understanding the structure of 78 proteins, we can alter and regulate their biological functions, which may lead to many 79 medical, scientific, and technological advancements. Here we present SESCA, a new method 80 that allows the assessment, and refinement of protein model structures. SESCA predicts the 81 expected circular dichroism spectrum of a proposed protein model and compares it to an 82 experimental...