Dynamic Binary Modification (DBM) is a technique for modifying applications transparently while they are executed, working at the level of native code. However, DBM introduces a performance overhead, which in some cases can dominate execution time, making many uses impractical. The ARM hardware ecosystem poses unique challenges for high performance DBM systems because of the large number and wide range of capabilities of the commercially available implementations: from single issue, in order cores up to 6-issue out-of-order cores and including less traditional implementations. These variations raise the question of whether it is possible to develop DBM optimisations which either improve or, at the very least, do not affect performance on all available systems and microarchitectures. To answer this question, the performance of three new optimisations for the MAMBO DBM system has been evaluated on five systems using different microarchitectures. For comparison, the overhead of DynamoRIO, a high performance DBM system which was recently ported to the ARM architecture, is also evaluated.