Breast cancer (BC) or breast neoplasm is causing major menace to the life of women around the world. The significance of early detection and staging of BC has been substantial in diagnosing protocol. This work aims to develop an automated system that combines multivariate data analysis (PCA - principal components analysis) with ensemble recurrent neural network models (stacked OGRU-LSTM) to identify Raman spectral characteristics that can be used as spectral cancer markers for the detection of BC progression and staging. Features of blood plasma from histopathologically diagnosed BC candidates were compared to healthy ones in this study. The same is performed on different leading classification models as the stacked basic RNN, the stacked-RNN-LSTM, and RNN-GRU models. A total of 2,340 Raman spectra generated is evaluated in this study. It is found from the study that stage 3 and stage 2 are structurally identical, but with PCA-Factorial Discriminant Analysis (FDA) they can be distinguished from each other, hence the Raman spectrum pertaining to blood plasma samples of the BC candidates is classified efficiently, yielding potentially high values of specificity and sensitivity for all the BC stages. Comparative classification results show that the stacked OGRU-LSTM model outperforms well for BC detection, and better differentiates various stages of BC by employing the multivariate data analysis technique. The stacked OGRU-LSTM model achieved the highest classification accuracy (97.89 %), Cohen-kappa score (0.928), F1-score (0.957), and the lowermost number of test loss and MSE (0.037), indicating that the model outperforms other baseline classifiers.
HIGHLIGHTS
The use of Raman spectroscopy in conjunction with deep learning models and multivariate data analysis to diagnose and categorize blood plasma samples as cancerous or noncancerous and staging of breast cancer based on their chemical composition
To address the issue of underfitting and overfitting caused by insufficient Raman spectral data, spectral data augmentation techniques were implemented
The potential for this technique is used to accurately classify breast cancerous samples and hence reduce the number of unnecessary excisional breast biopsies
Stage 3 and stage 2 of breast cancer were found to be structurally identical but can be distinguished from each other using PCA-Factorial Discriminant Analysis with high specificity and sensitivity for all BC stages
The stacked OGRU-LSTM model outperformed other baseline classifiers for breast cancer detection and better differentiated various stages of breast cancer by employing multivariate data analysis technique
GRAPHICAL ABSTRACT