After evaluating two current methods, we demonstrate how the use of sequence-weighting techniques to reduce sequence redundancy and low-count corrections to account for small number of observations in limited size sequence families, can significantly improve the predictability of MI. The evaluation is made on large sets of both in silico-generated alignments as well as on biological sequence data. The methods included in the analysis are the APC (average product correction) and RCW (row-column weighting) methods. The best performing method was APC including sequence-weighting and low-count corrections. The use of sequence-permutations to calculate a MI rescaling is shown to significantly improve the prediction accuracy and allows for direct comparison of information values across protein families. Finally, we demonstrate how a lower bound of 400 sequences <62% identical is needed in an MSA in order to achieve meaningful predictive performances. With our contribution, we achieve a noteworthy improvement on the current procedures to determine coevolution and residue contacts, and we believe that this will have potential impacts on the understanding of protein structure, function and folding.
The yeast Pichia pastoris is a cost-effective and easily scalable system for recombinant protein production. In this work we compared the conformation of the receptor binding domain (RBD) from severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) Spike protein expressed in P. pastoris and in the well established HEK-293T mammalian cell system. RBD obtained from both yeast and mammalian cells was properly folded, as indicated by UV-absorption, circular dichroism and tryptophan fluorescence. They also had similar stability, as indicated by temperature-induced unfolding (observed Tm were 50 °C and 52 °C for RBD produced in P. pastoris and HEK-293T cells, respectively). Moreover, the stability of both variants was similarly reduced when the ionic strength was increased, in agreement with a computational analysis predicting that a set of ionic interactions may stabilize RBD structure. Further characterization by high-performance liquid chromatography, size-exclusion chromatography and mass spectrometry revealed a higher heterogeneity of RBD expressed in P. pastoris relative to that produced in HEK-293T cells, which disappeared after enzymatic removal of glycans. The production of RBD in P. pastoris was scaled-up in a bioreactor, with yields above 45 mg/L of 90% pure protein, thus potentially allowing large scale immunizations to produce neutralizing antibodies, as well as the large scale production of serological tests for SARS-CoV-2.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.