“…Specifically, Ω S,k is initialized as the current network parameter Ω, and is then updated with G Tr steps of gradient descents. Since an overlarge gradient leads to the instability of gradient descents, we limit the values of the source-task-specific gradient ∇ Ω S,k Loss DTrSup(k) (Ω S,k ) into a certain range and obtain the truncated source-task-specific gradient υ S,k as [42] [υ S,k ] p =min Υ, ∇ Ω S,k Loss DTrSup(k) (Ω S,k ) p , p = 1, · · · , len(υ S,k ), (18) where Υ is the upper threshold of the gradient. Generally, an overlarge Υ may lead to large fluctuations of the loss function, while a too small Υ distorts the direction of the update, resulting in too early convergence.…”