The low reproducibility of published experimental results in many scientific disciplines has recently garnered negative attention in scientific journals and the general media. Public transparency, including the availability of `raw' experimental data, will help to address growing concerns regarding scientific integrity. Macromolecular X-ray crystallography has led the way in requiring the public dissemination of atomic coordinates and a wealth of experimental data, making the field one of the most reproducible in the biological sciences. However, there remains no mandate for public disclosure of the original diffraction data. The Integrated Resource for Reproducibility in Macromolecular Crystallography (IRRMC) has been developed to archive raw data from diffraction experiments and, equally importantly, to provide related metadata. Currently, the database of our resource contains data from 2920 macromolecular diffraction experiments (5767 data sets), accounting for around 3% of all depositions in the Protein Data Bank (PDB), with their corresponding partially curated metadata. IRRMC utilizes distributed storage implemented using a federated architecture of many independent storage servers, which provides both scalability and sustainability. The resource, which is accessibleviathe web portal at http://www.proteindiffraction.org, can be searched using various criteria. All data are available for unrestricted access and download. The resource serves as a proof of concept and demonstrates the feasibility of archiving raw diffraction data and associated metadata from X-ray crystallographic studies of biological macromolecules. The goal is to expand this resource and include data sets that failed to yield X-ray structures in order to facilitate collaborative efforts that will improve protein structure-determination methods and to ensure the availability of `orphan' data left behind for various reasons by individual investigators and/or extinct structural genomics projects.
A nonredundant set of 9081 protein crystal structures in the Protein Data Bank was used to examine the solvent content, the number of polypeptide chains, and the oligomeric states of proteins in crystals as a function of crystal symmetry (as classified by crystal systems and space groups). It was found that there is a correlation between solvent content and crystal symmetry. Surprisingly, proteins crystallizing in lower symmetry systems have lower solvent content compared to those crystallizing in higher symmetry systems. Nevertheless, there is no universal correlation between solvent content and preferences of macromolecules to crystallize in certain space groups. Crystal symmetry as a function of oligomeric state was examined, where trimers, tetramers, and hexamers were found to prefer to crystallize in systems where the oligomer symmetry could be incorporated in the crystal symmetry. Our analysis also shows that the frequency distribution within the enantiomorphous pairs of space groups does not differ significantly, in contrast to previous reports.Keywords: solvent content; Matthews coefficient; protein crystals; oligomerization; space group frequency Supplemental material: see www.proteinscience.orgWater plays an important role in the structure of biomolecules and often influences protein function. Water molecules not only affect protein folding, but also mediate biological processes such as enzymatic reactions and molecular recognition. Information about the fraction of water (solvent) plays a significant role in the X-ray structure determination process. First, knowledge of the solvent content helps to determine the number of molecules in the asymmetric unit (Matthews 1968), which is crucial in early stages of crystal structure determination. Second, an approximate value of solvent content is needed for significant phase improvement by solvent flattening methods (Wang 1985;Leslie 1987;Abrahams and Leslie 1996), which is necessary to resolve the inherent phase ambiguity in single anomalous diffraction (SAD) experiments. For both SAD and MAD (multiwavelength anomalous diffraction) (Hendrickson 1991;Hendrickson et al. 1990), phase improvement by solvent flattening is critical for low resolution data (Kirillova et al. 2007), especially when non-crystallographic symmetry cannot be applied.Matthews (1968) observed that the solvent content in protein crystals ranged from 27% to 65%, with an average of 43%. He also showed that the quantity V M (the Matthews coefficient, defined as the ratio of the volume of the asymmetric unit to the molecular weight of all
Background The efficacy of COVID-19 convalescent plasma (CCP) is primarily ascribed as a source of neutralizing anti-SARS-CoV-2 antibodies. However, the composition of other immune components in CCP and their potential roles remain largely unexplored. This study aimed to describe the composition and concentrations of plasma cytokines and chemokines in eligible CCP donors. Methods A cross-sectional study was conducted among 20 pre-pandemic healthy blood donors without SARS-CoV-2 infection and 140 eligible CCP donors with confirmed SARS-CoV-2 infection. Electrochemiluminescence detection based multiplexed sandwich immunoassays were used to quantify plasma cytokine and chemokine concentrations (n=35 analytes). A SARS-CoV-2 microneutralization assay was also performed. Differences in the percent detection and distribution of cytokine and chemokine concentrations were examined by categorical groups using Fisher’s exact and Wilcoxon rank-sum tests, respectively. Results Among CCP donors (n=140), the median time since molecular diagnosis of SARS-CoV-2 was 44 days(interquartile range=38-50) and 9%(n=12) were hospitalized due to COVID-19. Compared to healthy blood donor controls, CCP donors had significantly higher plasma levels of IFN-γ, IL-10, IL-15, IL-21 and MCP-1, but lower levels of IL-1RA, IL-8, IL-16, and VEGF-A(P<0.0014). Significant differences were also observed in plasma levels of IL-8, IL-15 and IP-10 between CCP donors with low(<40) vs. high(≥160) anti-SARS-CoV-2 neutralizing antibody titers(P<0.0014). The median levels of IL-6, IL-8, TNF-α, IL-12/IL23p40, MDC were significantly higher among CCP donors who were hospitalized vs. non-hospitalized(P<0.05). Conclusion Heterogeneity in cytokine and chemokine composition of CCP suggests there is a different inflammatory state among the CCP donors as compared to SARS-CoV-2 naïve, healthy blood donors.
The period 2000–2015 brought the advent of high-throughput approaches to protein structure determination. With the overall funding on the order of $2 billion (in 2010 dollars), the structural genomics (SG) consortia established worldwide have developed pipelines for target selection, protein production, sample preparation, crystallization, and structure determination by X-ray crystallography and NMR. These efforts resulted in the determination of over 13,500 protein structures, mostly from unique protein families, and increased the structural coverage of the expanding protein universe. SG programs contributed over 4,400 publications to the scientific literature. The NIH-funded Protein Structure Initiatives (PSI) alone have produced over 2,000 scientific publications, which to date have attracted more than 93,000 citations. Software and database developments that were necessary to handle high-throughput structure determination workflows have led to structures of better quality and improved integrity of the associated data. Organized and accessible data have a positive impact on the reproducibility of scientific experiments. Most of the experimental data generated by the SG centers are freely available to the community and has been utilized by scientists in various fields of research. SG projects have created, improved, streamlined, and validated many protocols for protein production and crystallization, data collection, and functional analysis, significantly benefiting biological and biomedical research.
Modern high-throughput structural biology laboratories produce vast amounts of raw experimental data. The traditional method of data reduction is very simple—results are summarized in peer-reviewed publications, which are hopefully published in high-impact journals. By their nature, publications include only the most important results derived from experiments that may have been performed over the course of many years. The main content of the published paper is a concise compilation of these data, an interpretation of the experimental results, and a comparison of these results with those obtained by other scientists. Due to an avalanche of structural biology manuscripts submitted to scientific journals, in many recent cases descriptions of experimental methodology (and sometimes even experimental results) are pushed to supplementary materials that are only published online and sometimes may not be reviewed as thoroughly as the main body of a manuscript. Trouble may arise when experimental results are contradicting the results obtained by other scientists, which requires (in the best case) the reexamination of the original raw data or independent repetition of the experiment according to the published description of the experiment. There are reports that a significant fraction of experiments obtained in academic laboratories cannot be repeated in an industrial environment (Begley CG & Ellis LM, Nature 483(7391):531–3, 2012). This is not an indication of scientific fraud but rather reflects the inadequate description of experiments performed on different equipment and on biological samples that were produced with disparate methods. For that reason the goal of a modern data management system is not only the simple replacement of the laboratory notebook by an electronic one but also the creation of a sophisticated, internally consistent, scalable data management system that will combine data obtained by a variety of experiments performed by various individuals on diverse equipment. All data should be stored in a core database that can be used by custom applications to prepare internal reports, statistics, and perform other functions that are specific to the research that is pursued in a particular laboratory. This chapter presents a general overview of the methods of data management and analysis used by structural genomics (SG) programs. In addition to a review of the existing literature on the subject, also presented is experience in the development of two SG data management systems, UniTrack and LabDB. The description is targeted to a general audience, as some technical details have been (or will be) published elsewhere. The focus is on “data management,” meaning the process of gathering, organizing, and storing data, but also briefly discussed is “data mining,” the process of analysis ideally leading to an understanding of the data. In other words, data mining is the conversion of data into information. Clearly, effective data management is a precondition for any useful data mining. If done properly, gathering det...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.