Dependency Parsing as Head Selection

Zhang, Xingxing; Cheng, Jianpeng; Lapata, Mirella

doi:10.18653/v1/e17-1063

Cited by 82 publications

(79 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then a dependency label is predicted for each child-parent pair. This approach is related to Dozat and Manning (2017) and Zhang et al (2017), where the main difference is that our model works on a multi-task framework. To predict the parent node of w t , we define a matching function between w t and the candidates of the parent node as m (t, j) = h…”

Section: Syntactic Task: Dependency Parsingmentioning

confidence: 99%

“…Both of and Tai et al (2015) explicitly used syntactic trees, and relied on attention mechanisms. However, our method uses the simple maxpooling strategy, which suggests that it is worth Single JMT all JMTAB JMTABC JMTDE JMTCD JMTCE A ↑ POS 97.45 97.55 97.52 97.54 n/a n/a n/a B ↑ Chunking 95.02 n/a 95.77 n/a n/a n/a n/a 97.78 Kumar et al (2016) 97.56 Ma and Hovy (2016) 97.55 Søgaard (2011) 97.50 Collobert et al (2011) 97.29 Tsuruoka et al (2011) 97.28 Toutanova et al (2003) 97.27 Søgaard and Goldberg (2016) 95.56 Suzuki and Isozaki (2008) 95.15 Collobert et al (2011) 94.32 Kudo and Matsumoto (2001) 93.91 Tsuruoka et al (2011) 93.81 Dozat and Manning (2017) 95.74 94.08 Andor et al (2016) 94.61 92.79 94.23 92.36 Zhang et al (2017) 94.10 91.90 93.99 92.05 93.10 90.90 Bohnet (2010) 92.88 90.71 Table 4: Dependency results. 0.243 Tai et al (2015) 0.253 Table 7: Effectiveness of the Shortcut Connections (SC) and the Label Embeddings (LE).…”

Section: Pos Taggingmentioning

confidence: 99%

See 1 more Smart Citation

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Hashimoto¹,

Xiong²,

Tsuruoka³

et al. 2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

464

334

View full text Add to dashboard Cite

Transfer and multi-task learning have traditionally focused on either a single source-target pair or very few, similar tasks. Ideally, the linguistic levels of morphology, syntax and semantics would benefit each other by being trained in a single model. We introduce a joint many-task model together with a strategy for successively growing its depth to solve increasingly complex tasks. Higher layers include shortcut connections to lower-level task predictions to reflect linguistic hierarchies. We use a simple regularization term to allow for optimizing all model weights to improve one task's loss without exhibiting catastrophic interference of the other tasks. Our single end-to-end model obtains state-of-the-art or competitive results on five different tasks from tagging, parsing, relatedness, and entailment tasks.

show abstract

Section: Syntactic Task: Dependency Parsingmentioning

confidence: 99%

Section: Pos Taggingmentioning

confidence: 99%

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Hashimoto¹,

Xiong²,

Tsuruoka³

et al. 2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

464

334

View full text Add to dashboard Cite

show abstract

“…Following Chen and Manning (2014); Dyer et al (2015), we used 300-dimensional pretrained GloVe vectors (Pennington et al, 2014) to initialize our word embedding matrix. Other model parameters are initialized using a normal distribution with a mean of 0 and a variance of (Zhang and Nivre, 2011); C&M14 (Chen and Manning, 2014); ConBSO (Wiseman and Rush, 2016); Dyer15 (Dyer et al, 2015); Weiss15 (Weiss et al, 2015); K&G16 (Kiperwasser and Goldberg, 2016); DENSE (Zhang et al, 2017).…”

Section: Setupmentioning

confidence: 99%

Stack-based Multi-layer Attention for Transition-based Dependency Parsing

Zhang¹,

Liu²,

Zhou³

et al. 2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Although sequence-to-sequence (seq2seq) network has achieved significant success in many NLP tasks such as machine translation and text summarization, simply applying this approach to transition-based dependency parsing cannot yield a comparable performance gain as in other stateof-the-art methods, such as stack-LSTM and head selection. In this paper, we propose a stack-based multi-layer attention model for seq2seq learning to better leverage structural linguistics information. In our method, two binary vectors are used to track the decoding stack in transition-based parsing, and multi-layer attention is introduced to capture multiple word dependencies in partial trees. We conduct experiments on PTB and CTB datasets, and the results show that our proposed model achieves state-of-the-art accuracy and significant improvement in labeled precision with respect to the baseline seq2seq model.

show abstract

“…This poses a problem for most statistical parsers. In our work, we focus on recent neural dependency parsers which, instead of using hand-crafted feature templates, directly learn the features from the training data (Chen and Manning, 2014;Zhang et al, 2017). These parsers usually introduce an UNKNOWN token for out-ofvocabulary words.…”

Section: The Problem With Compoundsmentioning

confidence: 99%

“…Our parsing model is an extension of the headselection parser of Zhang et al (2017) (figure 1). Given the sentence S = (w 0 , w 1 , ..., w N ) and x i as the input representation of word w i , the model …”

Section: Parsing Modelmentioning

confidence: 99%

Proceedings of the First Workshop on Subword and Character Level Models in NLP

2017

View full text Add to dashboard Cite

IntroductionTraditional NLP starts with a hand-engineered layer of representation, the level of tokens or words. A tokenization component first breaks up the text into units using manually designed rules. Tokens are then processed by components such as word segmentation, morphological analysis and multiword recognition. The heterogeneity of these components makes it hard to create integrated models of both structure within tokens (e.g., morphology) and structure across multiple tokens (e.g., multi-word expressions). This approach can perform poorly (i) for morphologically rich languages, (ii) for noisy text, (iii) for languages in which the recognition of words is difficult and (iv) for adaptation to new domains; and (v) it can impede the optimization of preprocessing in end-to-end learning.The workshop provides a forum for discussing recent advances as well as future directions on sub-word and character-level natural language processing and representation learning that address these problems.We received 37 submissions, out of which we accepted 24 as papers and 4 as extended abstracts. AbstractMost of neural language models use different kinds of embeddings for word prediction. While word embeddings can be associated to each word in the vocabulary or derived from characters as well as factored morphological decomposition, these word representations are mainly used to parametrize the input, i.e. the context of prediction. This work investigates the effect of using subword units (character and factored morphological decomposition) to build output representations for neural language modeling. We present a case study on Czech, a morphologically-rich language, experimenting with different input and output representations. When working with the full training vocabulary, despite unstable training, our experiments show that augmenting the output word representations with character-based embeddings can significantly improve the performance of the model. Moreover, reducing the size of the output look-up table, to let the character-based embeddings represent rare words, brings further improvement. IntroductionMost of neural language models, such as n-gram models (Bengio et al., 2003) are word based and rely on the definition of a finite vocabulary V. Therefore, a look-up table maps each wordw ∈ V to a vector of real features, and is stored in a matrix. While this approach yields significant improvement for a variety of tasks and languages, see for instance (Schwenk, 2007) in speech recognition and (Le et al., 2012; Devlin et al., 2014; in machine translation, it induces several limitations.For morphologically-rich languages, like Czech or German, the lexical coverage is still an important issue, since there is a combinatorial explosion of word forms, most of which are hardly observed on training data. On the one hand, growing the look-up table is not a solution, since it would increase the number of parameters without having enough training examples for a proper estimation. On the other hand, rare words can be replaced...

show abstract

Dependency Parsing as Head Selection

Cited by 82 publications

References 33 publications

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks

Stack-based Multi-layer Attention for Transition-based Dependency Parsing

Proceedings of the First Workshop on Subword and Character Level Models in NLP

Contact Info

Product

Resources

About