Surface-enhanced Raman scattering (SeRS) is a valuable analytical technique for the analysis of biological samples. However, due to the nature of SERS it is often challenging to exploit the generated data to obtain the desired information when no reporter or label molecules are used. Here, the suitability of random forest based approaches is evaluated using SeRS data generated by a simulation framework that is also presented. More specifically, it is demonstrated that important SERS signals can be identified, the relevance of predefined spectral groups can be evaluated, and the relations of different SERS signals can be analyzed. It is shown that for the selection of important SERS signals Boruta and surrogate minimal depth (SMD) and for the analysis of spectral groups the competing method Learner of Functional Enrichment (LeFE) should be applied. In general, this investigation demonstrates that the combination of random forest approaches and SeRS data is very promising for sophisticated analysis of complex biological samples. Surface-enhanced Raman scattering (SERS) is an analytical approach that is capable to study small structures in biological materials 1 and that is even able to detect single molecules 2,3. Because SERS can also be applied as in vitro analytical tool 4 , e.g. to analyze living cells 5,6 it has the potential to become the next generation sensor technology to monitor cells and tissues 7. Hence, SERS has been widely applied, for example to study blood 8 , bacteria 9 , viruses 10 , cancer 11 , and to develop a pH sensor in living cells 12. SERS analyzes the local environment of nanoparticles that are utilized as nanoprobes which can result in very diverse SERS spectra in environments with many different biomolecules. Hence, one of the main challenges of biological SERS applications is the question of how to obtain reliable and interpretable results. One possible solution is the application of SERS labels, nanoparticles that are combined with functionalized reporter molecules for specific binding and, hence, to obtain more reproducible and specific SERS spectra 7. However, in this case, usually only the signals of the reporter molecules are detected. Another approach that can also be applied to the spectra of reporter molecules is the analysis of the SERS data with multivariate statistical methods. This combination has for example been used to classify bacteria 13 and for cell imaging 14. In this context, usually unsupervised methods like principal component analysis (PCA) and hierarchical cluster analysis (HCA) are applied. However, in label-free SERS experiments it has been shown that variation due to the nature of SERS can hamper analysis with PCA and HCA when biological samples containing multiple molecules are analyzed 15. This can be circumvented when supervised methods like artificial neural networks (ANN) are applied. Although ANNs have been utilized, e.g. to quantify caffeine 16 , food dye 17 , and metabolite gradients 18 and to discriminate DNA 19 , they have not been applied to SERS da...