“…Here, we dispute the previous use of top-N accuracy [6,9,12,16,[32][33][34][35][36][37] and to introduce four different metrics, namely, round-trip accuracy, coverage, class diversity and Jensen-Shannon divergence [50], as seen in Figure 3, to evaluate single step retrosynthetic models and through them retrosynthetic tools as a whole. All these four metrics have been critically designed and assessed with the help of human domain experts (see Section 4.2 for a detailed description).…”