Unsupervised Neural Dependency Parsing

Jiang, Yong; Han, Wenjuan; Tu, Kewei

doi:10.18653/v1/d16-1073

Cited by 54 publications

(75 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…WSJ10 WSJ Unlexicalized Approaches, with WSJ10 EVG (Headden III et al, 2009) 65.0 -TSG-DMV (Blunsom and Cohn, 2010) 65.9 53.1 PR-S (Gillenwater et al, 2010) 64.3 53.3 HDP-DEP (Naseem et al, 2010) 73.8 -UR-A E-DMV (Tu and Honavar, 2012) 71.4 57.0 Neural E-DMV (Jiang et al, 2016) 72.5 57.6 Systems Using Lexical Information and/or More Data LexTSG-DMV (Blunsom and Cohn, 2010) 67.7 55.7 L-EVG (Headden III et al, 2009) 68.8 -CS (Spitkovsky et al, 2013) 72.0 64.4 MaxEnc (Le and Zuidema, 2015) 73. Again, we find that good initialization leads to better performance than KM initialization, and both good initialization and KM initialization are significantly better than random and uniform initialization.…”

Section: Methodsmentioning

confidence: 99%

“…Here we employ a neural approach to smoothing. Specifically, we propose a lexicalized extension of neural DMV (Jiang et al, 2016) and we call the resulting approach L-NDMV.…”

Section: Lexicalized Ndmvmentioning

confidence: 99%

“…It was previously shown that the heuristic KM initialization method by Klein and Manning (2004) does not work well for lexicalized grammar induction (Headden III et al, 2009;Pate and Johnson, 2016) and it is very helpful to initialize learning with a model learned by a different grammar induction method (Le and Zuidema, 2015;Jiang et al, 2016). We tested both KM initialization and the following initialization method: we first learn an unlexicalized DMV using the grammar induction method of Naseem et al (2010) and use it to parse the training corpus; then, from the parse trees we run maximum likelihood estimation to produce the initial lexicalized model.…”

Section: Model Initializationmentioning

confidence: 99%

“…Finally, smoothing techniques can be used to reduce the negative impact of data scarcity. One example is Neural DMV (NDMV) (Jiang et al, 2016) which incorporates neural networks into DMV and can automatically smooth correlated grammar rule probabilities.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Dependency Grammar Induction with Neural Lexicalization and Big Training Data

Han

Jiang

2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

We study the impact of big models (in terms of the degree of lexicalization) and big data (in terms of the training corpus size) on dependency grammar induction. We experimented with L-DMV, a lexicalized version of Dependency Model with Valence (Klein and Manning, 2004) and L-NDMV, our lexicalized extension of the Neural Dependency Model with Valence (Jiang et al., 2016). We find that L-DMV only benefits from very small degrees of lexicalization and moderate sizes of training corpora. L-NDMV can benefit from big training data and lexicalization of greater degrees, especially when enhanced with good model initialization, and it achieves a result that is competitive with the current state-of-the-art.

show abstract

Section: Methodsmentioning

confidence: 99%

“…Here we employ a neural approach to smoothing. Specifically, we propose a lexicalized extension of neural DMV (Jiang et al, 2016) and we call the resulting approach L-NDMV.…”

Section: Lexicalized Ndmvmentioning

confidence: 99%

Section: Model Initializationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Dependency Grammar Induction with Neural Lexicalization and Big Training Data

Han

Jiang

2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

show abstract

“…where D(x) is the set of all possible dependency arcs of sentence x, 1[·] is the indicator function, and µ(x, i, j) is the expected count defined as follows, (Jiang et al, 2016), and Convex-MST (Grave and Elhadad, 2015) Methods WSJ10 WSJ Basic Setup Feature DMV (Berg-Kirkpatrick et al, 2010) 63.0 -UR-A E-DMV (Tu and Honavar, 2012) 71.4 57.0 Neural E-DMV (Jiang et al, 2016) 69.7 52.5 Neural E-DMV (Good Init) (Jiang et al, 2016) 72.5 57.6 Basic Setup + Universal Linguistic Prior Convex-MST (Grave and Elhadad, 2015) 60.8 48.6 HDP-DEP (Naseem et al, 2010) 71.9 -CRFAE 71.7 55.7 Systems Using Extra Info LexTSG-DMV (Blunsom and Cohn, 2010) 67.7 55.7 CS (Spitkovsky et al, 2013) 72.0 64.4 MaxEnc (Le and Zuidema, 2015) 73.2 65.8 Table 3: Comparison of recent unsupervised dependency parsing systems on English. Basic setup is the same as our setup except that linguistic prior is not used.…”

Section: Algorithmmentioning

confidence: 99%

CRF Autoencoder for Unsupervised Dependency Parsing

Cai

Jiang

2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

Unsupervised dependency parsing, which tries to discover linguistic dependency structures from unannotated data, is a very challenging task. Almost all previous work on this task focuses on learning generative models. In this paper, we develop an unsupervised dependency parsing model based on the CRF autoencoder. The encoder part of our model is discriminative and globally normalized which allows us to use rich features as well as universal linguistic priors. We propose an exact algorithm for parsing as well as a tractable learning algorithm. We evaluated the performance of our model on eight multilingual treebanks and found that our model achieved comparable performance with state-of-the-art approaches.

show abstract