PCA–LDA scatter plot for Raman spectra of wild-type (circles) and radioresistant (traingles) breast cancer cell lines. An accuracy of 100% is achieved in classifying radioresistant from wild-type for all 198 spectra in the test set (open markers).
Numerous publications showing that robust prediction models for microorganisms based on Raman micro-spectroscopy in combination with chemometric methods are feasible, often with very precise predictions. Advances in machine learning and easier accessibility to software make it increasingly easy for users to generate predictive models from complex data. However, the question regarding why those predictions are so accurate receives much less attention. In our work, we use Raman spectroscopic data of fungal spores and carotenoid-containing microorganisms to show that it is often not the position of the peaks or the subtle differences in the band ratios of the spectra, due to small differences in the chemical composition of the organisms, that allow accurate classification. Rather, it can be characteristic effects on the baselines of Raman spectra in biochemically similar microorganisms that can be enhanced by certain data pretreatment methods or even neutral-looking spectral regions can be of great importance for a convolutional neural network. Using a method called Gradient-weighted Class Activation Mapping, we attempt to peer into the black box of convolutional neural networks in microbiological applications and show which Raman spectral regions are responsible for accurate classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.