Analyzing stochastic algorithms for comprehensive performance and comparison across diverse contexts is essential. By evaluating and adjusting algorithm effectiveness across a wide spectrum of test functions, including both classical benchmarks and CEC-C06 2019 conference functions, distinct patterns of performance emerge. In specific situations, underscoring the importance of choosing algorithms contextually. Additionally, researchers have encountered a critical issue by employing a statistical model randomly to determine significance values without conducting other studies to select a specific model for evaluating performance outcomes. To address this concern, this study employs rigorous statistical testing to underscore substantial performance variations between pairs of algorithms, thereby emphasizing the pivotal role of statistical significance in comparative analysis. It also yields valuable insights into the suitability of algorithms for various optimization challenges, providing professionals with information to make informed decisions. This is achieved by pinpointing algorithm pairs with favorable statistical distributions, facilitating practical algorithm selection. The study encompasses multiple nonparametric statistical hypothesis models, such as the Wilcoxon rank-sum test, single-factor analysis, and two-factor ANOVA tests. This thorough evaluation enhances our grasp of algorithm performance across various evaluation criteria. Notably, the research addresses discrepancies in previous statistical test findings in algorithm comparisons, enhancing result reliability in the later research. The results proved that there are differences in significance results, as seen in examples like Leo versus the FDO, the DA versus the WOA, and so on. It highlights the need to tailor test models to specific scenarios, as p-value outcomes differ among various tests within the same algorithm pair.