Model evaluation is a cornerstone of scientific research as it represents the findings' accuracy and model performance. A case study is commonly used in evaluating software engineering models. Due to criticism in terms of generalization from a single case study and testers, deciding on the number of case studies used for evaluation and the number of testers has been one of the researchers’ challenges. Multiple case studies with multiple testers can be difficult in some domains, such as penetration testing, due to the complexity and time needed to prepare test cases. This study aims to review the literature and examine the evaluation methods used pertaining to the number of case studies and testers involved. This study is beneficial for researchers, students, and penetration testers as it provides case study design steps that are useful to determine the appropriate number of test cases and testers required. The paper's findings and novelty highlight that a single case study with a single tester is enough to evaluate a model. It also strikes a balance between what is enough for the evaluation and the need to reduce criticisms of a single case study by using two case studies with a single tester. Doi: 10.28991/ESJ-2023-07-03-025 Full Text: PDF