Binding affinity prediction is pivotal in drug design, offering insights into the interactions between ligands and protein targets and thereby significantly influencing the drug development pipeline. Its potential to expedite the identification of drug candidates has led to extensive research focused on developing machine learning algorithms for predicting binding affinity. However, most developments have concentrated on independently and identically distributed (i.i.d) data. In real-world scenarios, prediction models may encounter novel chemical substructures, protein families absent from the training set, variations in experimental conditions, and evolving drug resistance mechanisms. These factors can lead to a significant degradation in performance, causing models to suggest suboptimal compounds or overlook promising candidates—challenges commonly referred to as Out-of-Domain (OOD) in the machine learning community. To address the OOD challenges in binding affinity algorithm development, several benchmarks have been introduced. However, we observe that many lack a convenient codebase framework for swift algorithm evaluation.In this paper, building upon the DrugOOD dataset, we introduce a comprehensive benchmarking framework to assess the resilience and adaptability of OOD algorithms in binding affinity prediction. Our framework offers a streamlined approach for evaluating algorithmic performance in OOD scenarios. Furthermore, we propose a method that surpasses existing state-of-the-art approaches in our benchmark tests. We anticipate that our contributions will spur further research addressing OOD challenges and enhance the reliability and robustness of binding affinity predictions in drug design. Code available at: https://github.com/zehanzz/BioFrontierOOD.git