The structural registration of chemically modified macromolecules is vital for the development of biopharmaceuticals. However, registration and search of such complex molecules has so far posed formidable challenges performance-wise, since today's chemistry-oriented databases do not scale well to macromolecules. As a practical consequence, macromolecules tend to be stored in protein databases with a focus on protein sequence only, and salient chemistry details are therefore lost. This article describes protein format extensions and the use of pseudoatoms for representing natural amino acids in chemical structures to allow high-performance registration and retrieval of large macromolecules. The representations include exact chemical modifications and enable lossless conversion between chemistry and sequence formats. Registration is done in parallel in both sequence and chemistry formats, and users can register and retrieve molecules in either format as they choose, resulting in what we call a BioChemformatics database. Having both sequence and chemistry formats available on-demand allows for the construction of protein SAR tables with mixed sequence and chemistry information. Likewise, searching may combine sequence and chemistry terms and be performed in standard vendor applications like MDL's ISIS/Base or in-house applications using standard SQL queries.
The isoelectric point of a peptide is a physicochemical property that can be accurately predicted from the sequence of the peptide when the peptide is built from natural amino acids. Peptides can however have chemical modifications, such as phosphorylations, amidations, and unnatural amino acids, which can result in erroneous predictions if not accounted for. Here we report on an open source program, pICalculax, which in an extensible way can handle pI calculations of modified peptides. Tests on a database of modified peptides and experimentally determined pI values show an improvement in pI predictions when taking the modifications into account. The correlation coefficient improves from 0.45 to 0.91, and the root-mean-square deviation likewise improves from 3.3 to 0.9. The program is available at https://github.com/EBjerrum/pICalculax.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.