“…Then, it should be proved that the online form and the offline form converge to the same value. In this regard, is defined as follows: It can be proved that the conditions (a), (b), and (c) of Lemma 2 in Reference 41 are satisfied, which completes the first step of the proof. Then the offline TD value function …”