Continuous mixture interpretation methods that employ probabilistic genotyping to compute the Likelihood Ratio (LR) utilize more information than threshold-based systems. The continuous interpretation schemes described in the literature, however, do not all use the same underlying probabilistic model and standards outlining which probabilistic models may or may not be implemented into casework do not exist; thus, it is the individual forensic laboratory or expert that decides which model and corresponding software program to implement. For countries, such as the United States, with an adversarial legal system, one can envision a scenario where two probabilistic models are used to present the weight of evidence, and two LRs are presented by two experts. Conversely, if no independent review of the evidence is requested, one expert using one model may present one LR as there is no standard or guideline requiring the uncertainty in the LR estimate be presented. The choice of model determines the underlying probability calculation, and changes to it can result in non-negligible differences in the reported LR or corresponding verbal categorization presented to the trier-of-fact. In this paper, we study the impact of model differences on the LR and on the corresponding verbal expression computed using four variants of a continuous mixture interpretation method. The four models were tested five times each on 101, 1-, 2- and 3-person experimental samples with known contributors. For each sample, LRs were computed using the known contributor as the person of interest. In all four models, intra-model variability increased with an increase in the number of contributors and with a decrease in the contributor’s template mass. Inter-model variability in the associated verbal expression of the LR was observed in 32 of the 195 LRs used for comparison. Moreover, in 11 of these profiles there was a change from LR > 1 to LR < 1. These results indicate that modifications to existing continuous models do have the potential to significantly impact the final statistic, justifying the continuation of broad-based, large-scale, independent studies to quantify the limits of reliability and variability of existing forensically relevant systems.