“…It is noteworthy that there are many other important miscellaneous works we do not mention in the previous sections. For example, numerous works have proposed to improve upon vanilla gradient-based methods [174,178,65]; linguistic rules such as negation, morphological inflection can be extracted by neural models [141,142,158]; probing tasks can used to explore linguistic properties of sentences [3,80,43,75,89,74,34]; the hidden state dynamics in recurrent nets are analysed to illuminate the learned long-range dependencies [73,96,67,179,94]; [169,166,168,101,57,167] studied the ability of neural sequence models to induce lexical, grammatical and syntactic structures; [91,90,12,136,159,24,151,85] modeled the reasoning process of the model to explain model behaviors; [157,139,28,163,219,170,180,137,106,58,162,81...…”