Early screening and precise staging are crucial for reducing mortality in patients with nasopharyngeal carcinoma (NPC). This study aimed to assess the performance of blood protein surface-enhanced Raman scattering (SERS) spectroscopy, combined with deep learning, for the precise detection of NPC. A highly efficient protein SERS analysis, based on a membrane purification technique and super-hydrophobic platform, was developed and applied to blood samples from 1164 subjects, including 225 healthy volunteers, 120 stage I, 249 stage II, 291 stage III, and 279 stage IV NPC patients. The proteins were rapidly purified from only 10 µL of blood plasma using the membrane purification technique. Then, the super-hydrophobic platform was prepared to pre-concentrate tiny amounts of proteins by forming a uniform deposition to provide repeatable SERS spectra. A total of 1164 high-quality protein SERS spectra were rapidly collected using a self-developed macro-Raman system. A convolutional neural network-based deep-learning algorithm was used to classify the spectra. An accuracy of 100% was achieved for distinguishing between the healthy and NPC groups, and accuracies of 96%, 96%, 100%, and 100% were found for the differential classification among the four NPC stages. This study demonstrated the great promise of SERS- and deep-learning-based blood protein testing for rapid, non-invasive, and precise screening and staging of NPC.
Surface‐enhanced Raman spectroscopy (SERS) has shown highly promising for existing cancer screening. However, previous “proof‐of‐concept” studies ignored the natural imbalance of cancer types in the population, leading the model to be biased toward learning more features in majority class during the learning process at the expense of ignoring minority class. Herein, a power‐law‐based synthetic minority oversampling technique (PL‐SMOTE) method is proposed to guide the resampling of multiclass serum SERS data by analyzing the long‐tailed (power‐law) distribution of cancer prevalence in the population. The proposed PL‐SMOTE method balances the number of minorities to resample and the number of overlaps between classes by introducing modulating factor. Modeling on resampled datasets synthesized by PL‐SMOTE verifies the effectiveness of proposed PL‐SMOTE method. After further fine‐tuning, the parameters of the deep neural network model and PL‐SMOTE method, an optimal cancer screening model with an optimal macroaveraged Recall score of 97.24% and an optimal macroaveraged F2‐Score of 97.38% is obtained. A new method for multiclass imbalanced resampling is provided, which has significant improvement on model performance in terms of SERS cancer screening. The method also inspires in other multiclass imbalanced scenario, such as biological medicine, abnormal detection, and disaster prediction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.