“…t R b,T +1 (π b,t ) ≥ T t=T −τ α b,t R b,T +1 (C * b,t ) Reg b,T,τ (π b,t ) τ . (36)From(33), we haveE[R b,T +1 (π (av) b,T +1 )|Z T b,1 ] ≥ T t=T −τ α b,t R b,T +1 (C * b,t ) Reg b,T,τ (π b,t ) τ −H max ∥α b,T ∥ 2 2 τ log 1 δ − M b,T +1 (w ̸ =b,T ) − D b,T (α b,T,τ ). +1 , we get E[R b,T +1 (π (av) b,T +1 )|Z T b,1 ] ≥ E R b,T +1 (C * b,T +1 )|Z T 2Reg b,T,τ (π b,t ) τ −2H max ∥α b,T ∥ 2 2 τ log 1 δ − M b,T +1 (w ̸ =b,T ) − 2D b,T (α b,T,τ ).…”