JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.. The National Institute of Environmental Health Sciences (NIEHS) and Brogan & Partners are collaborating with JSTOR to digitize, preserve and extend access to Environmental Health Perspectives.Often a compound's biological activity is determined by complex relationships between its structural components. Such a relationship often can only be adequately described and exploited by multivariate structure-activity relationship (SAR) studies that can deal with many variables simultaneously. Pattern recognition (PR) is a multivariate technique that is well suited for the qualitative, active-inactive, data that is often supplied by biological assays. PR studies of compounds of known activity can yield information that will allow the prediction of the activity of untested compounds. ADAPT is a computerized system that was developed for such PR-SAR studies. A general introduction to this field is presented and the methodology used for such a study is described in the context of an actual study of mutagenic compounds. The data requirements, descriptor generation, and the details of a PR study are discussed. In addition, the example study was chosen to highlight the problems that may occur if a study is not well formulated and carefully executed. Current work and future plans for computerized mutagen screening are discussed. FIGURE 1. Boiling point vs. melting point for some simple aldehydes (A) and ketones (K). Note that the two classes cluster in separate regions of the plot. Z is a compound of unknown classification.
eral assumptions: (1) The activity and its variation can be explained by variations within the structures. (2) The structures can be sufficiently described by numerical indices (descriptors). (3) Pattern recognition techniques can be used to discover a relationship between the descriptors and the activity. (4) This relationship can be extrapolated to untested compounds.Each compound in a study is referred to as an observation or pattern and each structural feature or experimental property is referred to as a variable or descriptor. Simple problems can be viewed graphically as in Figure 1. This is a plot of several aldehydes (A) and ketones (K) as represented by their melting points and boiling points. In the "space" of these two physical parameters, the aldehydes cluster in a different region of this "two-space" than the ketones. This differential clustering is the basis for pattern recognition. If a new compound (Z) is plotted in this same space, the likelihood is high that it will belong to the same class as neighboring patterns, in this case, aldehydes. This is a very simple example of cluster analysis. Techniques that generate discriminant functions also rely on this clus...