Discrete models such as cellular automata may be ported from one platform or language onto another to improve performances, for instance by rewriting legacy Matlab code into C++ or adding optimizations into a Python implementation. Although such transformations can offer benefits such as scalability or maintainability, they also have the risk of introducing bugs. While standard verification techniques can always be applied, this situation presents a unique opportunity since the two implementations can be directly compared based on their simulation runs. Although comparing average results across runs of a same configuration is a common practice, our paper shows that many bugs would not be detected at this aggregate level. We thus propose comparing implementations of cellular automata by analyzing their outputs as images. In this paper, we examine the detection of several implementation errors using five different techniques (supervised/unsupervised image processing, decision trees, random forests, or deep learning) across three different cellular automata models (forest fire, tumor, HIV). We show that in some models, random forests can detect 4 out of 5 erroneous runs, although the accuracy depends both on the model and on the nature of the errors.