BackgroundIn biostatistics, assessing the fragility of research findings is crucial for understanding their clinical significance. This study focuses on the fragility index, unit fragility index, and relative risk index as measures to evaluate statistical fragility. The relative risk index quantifies the deviation of observed findings from therapeutic equivalence. In contrast, the fragility indices assess the susceptibility of p-values to change significance with minor alterations in outcomes within a 2×2 contingency table. While the fragility indices have intuitive appeal and have been widely applied, their behavior across a wide range of contingency tables has not been rigorously evaluated.MethodsUsing a Python software program, a simulation approach was employed to generate random 2×2 contingency tables. All tables under consideration exhibited p-values < 0.05 according to Fisher’s exact test. Subsequently, the fragility indices and the relative risk index were calculated. To account for sample size variations, fragility, and risk quotients were also calculated. A correlation matrix assessed the collinearity between each metric and the p-value.ResultsThe analysis included 2,000 contingency tables with cell counts ranging from 20 to 480. Notably, the formulas for calculating the fragility indices encountered limitations when cell counts approached zero or duplicate cell counts hindered standardized application. The correlation coefficients with p-values were as follows: unit fragility index (-0.806), fragility index (-0.802), fragility quotient (-0.715), unit fragility quotient (-0.695), relative risk index (-0.403), and relative risk quotient (-0.261).ConclusionCompared with the relative risk index and quotient, in the context of p-values < 0.05, the fragility indices and their quotients exhibited stronger correlations. This implies that the fragility indices offer limited additional information beyond the p-value alone. In contrast, the relative risk index displays relative independence, suggesting that it provides meaningful insights into statistical fragility by assessing how far observed findings deviate from therapeutic equivalence, regardless of the p-value.