This paper summarises the historical development of the discipline that is now called 'chemoinformatics'. It shows how this has evolved, principally as a result of technological developments in chemistry and biology during the past decade, from long-established techniques for the modelling and searching of chemical molecules. A total of 30 papers, the earliest dating back to 1957, are briefly summarised to highlight some of the key publications and to show the development of the discipline.Keywords: Chemical documentation; Chemical structures; Chemoinformatics; Drug discovery; History; Informatics; Molecules; Pharmaceutical research
IntroductionChemistry is, and has been for many years, one of the most information-rich academic disciplines. The very first journal devoted to chemistry was Chemisches Journal, which was published 1778-1784 and then, under the name of Chemische Annalen, till 1803 [1]. The growth in the chemical literature during the 19 th century led to a recognition of the need for comprehensive abstracting and indexing services for the chemical sciences. The principal such service is Chemical Abstracts Service (CAS), which was established in 1907 and which acts as the central repository for the world's published chemical (and, increasingly, life-sciences) information. The size of this repository is impressive: at the end of its first year of operations, the CAS database contained ca. 12K abstracts; by the end of 2006, this had grown to ca. 25M abstracts with ca. 1M being added each year. Most chemical publications will refer to one or more chemical substances. The structures of these substances form a vitally important part of the chemical literature, and one that distinguishes chemistry from many other disciplines. The CAS Registry System was started in 1965 to provide access to substance information, initially registering just small organic and inorganic molecules but now also registering biological sequences [2]. At the end of 1965 there 1 Correspondence to: Prof. Peter Willett, Department of Information Studies, University of Sheffield, 211 Portobello Street, Sheffield S1 4DP, UK; p.willett@sheffield.ac.uk. Journal of Information Science, XX (X) 2007, pp. 1-24 © CILIP, DOI: 10.1177/0165551506nnnnnn 1
Peter Willettwere ca. 222K substances in the System; by the end of 2006 this had grown to ca. 89M substances, of which ca. one-third were small molecules and the remainder biological sequences, with ca. 1.5M being added each year. There are also many additional molecular structures in public databases such as the Beilstein Database [3], and corporate files, in particular those of the major pharmaceutical, agrochemical and biotechnology companies.The presence of chemical structures requires very different computational techniques from those used for processing conventional textual information. These specialised techniques -now referred to by the name of chemoinformatics as discussed further below -have developed steadily over the fifty years that have passed since the founding of the I...