See Original Article here See Commentary on here Editor, When I learned that two programs for probabilistic genotyping had produced widely different results for the same case, I wanted to understand how that could have happened. My article [1] about the case was a first cut at explaining an event that I found extraordinary. In their comment, Buckleton et al. [2] suggest that there is really nothing surprising about what I observed. By their account, the two programs are similar and produce different outputs only because they received different inputs due to differing analytic thresholds. To the forensic science community, they seem to be saying: "move along folks, there is nothing to see here." They appear to agree with some of my points about reporting PG results, but they suggest that the attention I devoted to comparing the performance of STRMix and TrueAllele in this case "was not warranted" [2].Yet there is far more going on here than "different inputs producing different outputs." It is worth repeating that the two programs were analyzing the same case using the same data file. While the analytic thresholds the analysts chose to apply were different, there is no way to know which threshold was more appropriate, and hence no way of knowing which of the widely different result is more trustworthy. The suggestion that the programs actually led to the same "interpretation" because both produced exculpatory results is unconvincing when the likelihood ratios (LRs) supporting that interpretation differed by five to six orders of magnitude. While DNA evidence ultimately played a minor role in the case that I discussed, DNA evidence of a similar type-based on "very low information content profiles"-is playing an increasingly important role in the legal system. Hence, the conflicting results and approaches to analysis and reporting that occurred in this case will certainly re-occur in other cases. Indeed, in a non-peer-reviewed online posting, Mark Perlin and his employees assert that "Cybergenetics has re-analyzed STRMix 'inconclusive' or weak LR results on 85 evidence items, finding stronger exclusionary results for 55 of them (65%)" [3]. While Perlin et al. [3] make a number of false assertions in the same online posting [4], their comments make it clear that the case I highlighted is not the only one in which the two PG systems have produced differing results. In my view, such differences warrant careful consideration by the scientific and legal communities.