Background Immune checkpoint inhibitors (ICIs) have revolutionized cancer therapy, particularly in melanoma, by harnessing the body's immune system to target and eliminate tumor cells. However, only a subset of patients responds to treatment. Understanding patients' response to ICIs remains a critical challenge in cancer research due to the complexity and variability of immune interactions within the tumor microenvironment. Traditional bulk sequencing approaches miss important aspects of the microenvironment due to the lack of compartmental resolution. Following, our study leverages single-cell RNA sequencing data and utilizes machine learning techniques to predict ICI responses, while maintaining the richness of single-cell information and ensuring interpretability of the results. Methods We utilized a dataset of melanoma-infiltrated immune cells and applied the XGBoost algorithm for predicting patient response to ICI. Predictions were made by labeling cells according to their sample's response and aggregating the classifications in a leave-one-out cross-validation manner. To enhance model performance, we applied Boruta feature selection, identifying key predictive genes. Cell-type specific predictions were made to evaluate each cellular group's participation, and improve model accuracy. SHAP values were then used to extract detailed information, including gene-pair interactions and their conditional effects on response. Additionally, we developed a novel reinforcement learning model to quantify predictivity at the single-cell level. Results Initial analysis achieved an AUC score of 0.84, which was improved to 0.89 following the application of Boruta feature selection, leading to the identification of an 11-gene predictive signature. T cell clusters were identified as significant contributors to immune response based on cell-specific classification. SHAP value analysis further elucidated gene behaviors and interactions, providing insights into their effects on model predictions. Finally, reinforcement learning-derived cell scores were used to filter cells and improve prediction accuracy. These gene- and cell-based signatures were found to be highly predictive across different independent datasets, including that of lung, breast, brain and skin cancers. Conclusions Our approach demonstrates the potential of sophisticated computational methods, including machine learning and reinforcement learning, to enhance the understanding of cancer immunity and improve the prediction of treatment responses using single-cell data.