Non-linear models are challenging when it is time to verify that a certain HPC optimization does not degrade the accuracy of a model. Any apparently insignificant change in the code, in the software stack, or in the HPC system used can prevent bit-to-bit reproducibility, which added to the intrinsic nonlinearities of the model can lead to differences in the results of the simulation. Being able to deduce whether the different results can be explained by the internal variability of the model can help to decide if a specific change is acceptable. This manuscript presents a method that consists in estimating the uncertainty of the model outputs by doing many almost-identical simulations slightly modifying the model inputs. The statistical information extracted from these simulations can be used to discern if the results of a given simulation are indistinguishable or instead there are significant differences. Two illustrative usage examples of the method are provided, the first one studying whether a Lorenz system model can use less numerical precision and the second one studying whether the state-of-the art ocean model NEMO can safely use certain compiler optimization flags.