Tools for mineral identification based on Raman spectroscopy fall into two groups: those that are largely based on fits to diagnostic peaks associated with specific phases, and those that use the entire spectral range for multivariate analyses. In this project, we apply machine learning techniques to improve mineral identification using the latter group. We test the effects of common spectrum preprocessing steps, such as intensity normalization, smoothing, and squashing, and found that the last is superior. Next, we demonstrate that full-spectrum matching algorithms exhibit excellent performance in classification tasks, without requiring time-intensive dimensionality reduction or model training. This class of algorithms supports both vector and trajectory input formats, exploiting all available spectral information. By combining these insights, we find that optimal mineral spectrum matching performance can be achieved using careful preprocessing and a weighted-neighbors classifier based on a vector similarity metric.
IntroductionUse of Raman spectroscopy in the geosciences is growing rapidly, as evidenced by burgeoning publications in the fields of geology, cultural anthropology, environmental science, and, in particular, planetary science. Raman spectrometers have been proposed for exploration of a diverse range of extraterrestrial targets including asteroids, [1] Europa, [2,3] Mars, [4][5][6] the Moon, [7] and Venus. [8] A Raman laser spectrometer (RLS) is part of the science instrument payload of the European Space Agency 2018 ExoMars mission; the RLS instrument will target mineralogical and astrobiological investigations on the surface and subsurface of Mars. [9,10] Raman will also be used on the upcoming NASA Mars 2020 mission as part of the SuperCam and SHERLOC instruments. [11] What all these applications have in common is their dependence on software and mineralogical databases for phase identification and quantification of relative abundances of mineral components. Because of the structural diversity and chemical complexity of naturally occurring minerals, optimal applications for these purposes require an infusion of work into development of appropriate software and mineral databases.In practice, users commonly depend on a combination of matching software distributed by spectrometer manufacturers and one-on-one comparisons with database spectra for their identifications, but these types of identification have two critical limitations. First, identifications are only as good as the databases used to match them. No database can be entirely comprehensive, so it cannot be unequivocally claimed that any spectrum is a 'perfect match' to exactly one other phase.The RRUFF database was founded in 2006 at Arizona State University by Robert Downs to remedy this situation by providing coverage of all known mineral species. [12] It has quickly become the preeminent resource available for Raman spectra of minerals. RRUFF currently contains over 20 378 spectra acquired at several different laser wavelengths from orien...