“…A number of works have since used deep architectures suitable for sequence modelling (e.g. RNNs, CNNs, or graph convolutional networks) [6,15,23,31,36,38,45,52,60,64,76], including encoder-decoder approaches [8,51,73,78]. Berg et al [9] recently proposed using a Transformer model for the same task.…”