Library searches of mass spectra are an important part of mass spectrum interpretation; they constitute the essential part of a mass spectra expert system. A lot of efforts [1][2][3][4][5][6][7][8][9][10][11] have been made in this field. Hertz 2 presented a very nice discussion. So far, two strategies of library searches have been widely accepted, the forward search and the reverse search 3 and the latter was strongly recommended. 7 However, there is still room for improvements in library searches of mass spectra. To our knowledge, there are at least two problems unsolved in the field of library search of mass spectra: 1) the spectrum of the unknown is not in the library, but only spectra of similar molecule structures; and 2) the spectrum of the unknown is not in the library and there are not similar ones in the library either. We will not discuss the second problem in this paper because it concerns the structural interpretation of mass spectra.The aim of this paper is to try to solve problem 1. To our knowledge, if the spectrum of the unknown compound is in the spectral library, it will match well with the target reference spectrum by existing methods without difficulty. However, if the spectrum of the unknown is not in the spectral library, the search results are not always good enough. In our opinion, a valid algorithm should include the following two features: 1) pick out the real one when the spectrum of the unknown is in the mass spectral library; 2) pick out the most structurally similar one when the spectrum of the unknown is not in the library. In this way, a similarity search can afford help to the manual or machine interpretation of mass spectrum of the unknown. In order to achieve the task discussed above, the correlation between spectral similarity and structural similarity was firstly clarified. Then, a novel but also simple similarity index was developed based on this. Finally, some experiments and comparisons of results with the ones from the commercial instrument of Shimadzu were conducted in order to show the superiority of our method.
Theory and Methodology
Mass spectra and structural similarityA similarity index is needed in order to assess the structure similarity. It is well known that the molecules of similar structure, in general, will give similar mass spectra. It is such a basis that makes the library search doable. If the similarity index has a value close to 100% (for instance, bigger than 90%), this means that they two are quite similar in molecular structure. Otherwise, the structures of the two molecules are taken as different. A new matching algorithm for library searches of mass spectra is presented in this paper. The algorithm is based on the substructure similarity of substances. It emphasizes m/z positions rather than abundance values. 32 spectra, whose corresponding molecular weights are less than 200, were randomly selected from a mass library of 61993 spectra and taken as targets of library search to illustrate the availability of this algorithm. The results show that the...