Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing 2016
DOI: 10.18653/v1/d16-1159
|View full text |Cite
|
Sign up to set email alerts
|

Does String-Based Neural MT Learn Source Syntax?

Abstract: We investigate whether a neural, encoderdecoder translation system learns syntactic information on the source side as a by-product of training. We propose two methods to detect whether the encoder has learned local and global source syntax. A fine-grained analysis of the syntactic structure learned by the encoder reveals which kinds of syntax are learned and which are missing.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

6
268
1
5

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 298 publications
(294 citation statements)
references
References 17 publications
6
268
1
5
Order By: Relevance
“…In Section 4.2, we conduct experiments on the multi-granularity label prediction tasks (Shi et al, 2016), and investigate the representations of NMT encoders trained on both translation data and the training data of the label prediction tasks. Experimental results show that the proposed MG-SA indeed captures useful phrase information at various levels of granularities in both scenarios (Q3).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In Section 4.2, we conduct experiments on the multi-granularity label prediction tasks (Shi et al, 2016), and investigate the representations of NMT encoders trained on both translation data and the training data of the label prediction tasks. Experimental results show that the proposed MG-SA indeed captures useful phrase information at various levels of granularities in both scenarios (Q3).…”
Section: Methodsmentioning
confidence: 99%
“…Encoder Layers Recent works (Shi et al, 2016;Peters et al, 2018) show that different layers in encoder tend to capture different syntax and semantic features. Hence, there may have different needs for modeling phrase structure in each layer.…”
Section: Phrase Compositionmentioning
confidence: 99%
“…We use the default hyperparameters of de in our experiments. We fine-tune each model by training it further only on the target treebank (Shi et al, 2016). We use early stopping based on Label Attachment Score (LAS) on development set.…”
Section: Parameter Sharingmentioning
confidence: 99%
“…1a. We investigate the extent to which data augmentation is useful for learning POS features, using diagnostic classifiers (Veldhoen et al, 2016;Adi et al, 2016;Shi et al, 2016) tions (i.e., the output of word-level biLSTM, h i in Eq. 2), for the training and development data.…”
Section: Analysis Of Data Augmentationmentioning
confidence: 99%
“…[13] note that their system performs reasonably well in both tagging and parsing. [31] present an in-depth analysis of the syntactic knowledge learned by the recurrent sequence-to-sequence NMT.…”
Section: Related Workmentioning
confidence: 99%