“…For instance, dropping punctuation or removing certain words does not decrease the scores produced by the automated metric. On the other hand, Deriu et al (2022) showed that when an automated metric is used as a reward for reinforcement learning, the policy converges to sub optimal solutions, which are rated highly by the metric. Thus, a key future direction is to develop automatic metrics which are built to be robust against these kinds of attacks.…”