“…Another reason is that the mean reward under the optimal TR seen as a functional, Ψ, is pathwise differentiable at Q 0 if and only if, Q 0 -almost surely, either | q Y, 0 ( W )| > 0 or the conditional distributions of Y given ( A = 1 ,W ) and ( A = 0 ,W ) under Q 0 are degenerated [19, Theorem 1]. This explains why it is also assumed that the true law is not exceptional in [34, 18, 20]. Other approaches have been considered to circumvent the need to make this assumption: relying on m -out-of- n bootstrap [4] (at the cost of a
-rate of convergence and need to fine-tune m ), or changing the parameter of interest by focusing on the mean reward under the optimal TR conditional on patients for whom the best treatment has a clinically meaningful effect (truncation) [12, 16, 17].…”