The variation in amino acid sequence within sets of peptides is described by three principal properties, z1, z2, and z3, per varied amino acid position. These principal properties are derived from a principal components analysis of a matrix of 29 physicochemical variables for the 20 coded (in mRNA) amino acids. The scales z1, z2, and z3 are used to construct informative sets of analogues for exploring and developing quantitative structure-activity relationships (QSAR) of peptides. For the QSARs, the multivariate partial least squares (PLS) method is used. Multivariate QSARs are developed for four families of peptides, and it is shown how these QSARs can predict the activity of new peptide analogues.
The information contents in previously published peptide sets was compared with smaller sets of peptides selected according to statistical designs. It was found that minimum analogue peptide sets (MAPS) constructed by factorial or fractional factorial designs in physicochemical properties contained substantial structure‐activity information. Although five to six times smaller than the originally published peptide sets the MAPS resulted in QSAR models able to predict biological activity. The QSARs derived from a MAPS of nine dipeptides, and from a set of 58 dipeptides inhibiting angiotensin converting enzyme were compared and found to be of equal strength. Furthermore, for a set of bitter tasting dipeptides it was found that an incomplete MAPS of 10 dipeptides gave just as good a model as the model based on a set of 48 dipeptides. By comparison other non‐designed sets of peptides gave QSARs with poor predictive power. It was also demonstrated how MAPS centered on a lead peptide can be constructed as to specifically explore the physicochemical and biological properties in the vicinity of the lead. It was concluded that small information‐rich peptide sets MAPS can be constructed on the basis of statistical designs with principal properties of amino acids as design variables.
A set of a hundred aromatic substituents were multivariately characterized by nine descriptor variables taken from the literature. From the 9*100 data set were calculated four principal properties for the aromatic substituents as the four first dimensions in a principal components analysis, PCA. The first three principal properties were used to develop a strategy for selecting substituents from eight subgroups according to a factorial design.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.