Although state-of-the-art deep reinforcement learning often achieves superhuman levels on some tasks, the authors still struggle to analyze, compare or report the obtained results due to the unstable nature of the algorithms and the diversity of metrics used in the literature. Furthermore, these metrics fail to show some characteristics of the learning process, leading to misinterpretations by the analyst. The objective of this paper is to propose, implement and analyze a difference based evaluation metric that highlight different aspects of the learning process, allowing for more detailed results and analysis that can be used by automated software or analysts, experienced or not. A possible applicability for the proposed metric is to create automated evaluation systems, that can detect anomalies during the training process automatically.INDEX TERMS Machine learning, artificial intelligence (AI), differential equations, metrics.