Patients with the target gene mutation frequently derive significant clinical benefits from target therapy. However, differences in the abundance level of mutations among patients resulted in varying survival benefits, even among patients with the same target gene mutations. Currently, there is a lack of rational and interpretable models to assess the risk of treatment failure. In this study, we investigated the underlying coupled factors contributing to variations in medication sensitivity and established a statistically interpretable framework, named SAFE-MIL, for risk estimation. We first constructed an effectiveness label for each patient from the perspective of exploring the optimal grouping of patients’ positive judgment values and sampled patients into 600 and 1,000 groups, respectively, based on multi-instance learning (MIL). A novel and interpretable loss function was further designed based on the Hosmer-Lemeshow test for this framework. By integrating multi-instance learning with the Hosmer-Lemeshow test, SAFE-MIL is capable of accurately estimating the risk of drug treatment failure across diverse patient cohorts and providing the optimal threshold for assessing the risk stratification simultaneously. We conducted a comprehensive case study involving 457 non-small cell lung cancer patients with EGFR mutations treated with EGFR tyrosine kinase inhibitors. Results demonstrate that SAFE-MIL outperforms traditional regression methods with higher accuracy and can accurately assess patients’ risk stratification. This underscores its ability to accurately capture inter-patient variability in risk while providing statistical interpretability. SAFE-MIL is able to effectively guide clinical decision-making regarding the use of drugs in targeted therapy and provides an interpretable computational framework for other patient stratification problems. The SAFE-MIL framework has proven its effectiveness in capturing inter-patient variability in risk and providing statistical interpretability. It outperforms traditional regression methods and can effectively guide clinical decision-making in the use of drugs for targeted therapy. SAFE-MIL offers a valuable interpretable computational framework that can be applied to other patient stratification problems, enhancing the precision of risk assessment in personalized medicine. The source code for SAFE-MIL is available for further exploration and application at https://github.com/Nevermore233/SAFE-MIL.