Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 1 2017
DOI: 10.18653/v1/e17-1063
|View full text |Cite
|
Sign up to set email alerts
|

Dependency Parsing as Head Selection

Abstract: Conventional graph-based dependency parsers guarantee a tree structure both during training and inference. Instead, we formalize dependency parsing as the problem of independently selecting the head of each word in a sentence. Our model which we call DENSE (as shorthand for Dependency Neural Selection) produces a distribution over possible heads for each word using features obtained from a bidirectional recurrent neural network. Without enforcing structural constraints during training, DENSE generates (at infe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
79
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 82 publications
(79 citation statements)
references
References 33 publications
0
79
0
Order By: Relevance
“…Then a dependency label is predicted for each child-parent pair. This approach is related to Dozat and Manning (2017) and Zhang et al (2017), where the main difference is that our model works on a multi-task framework. To predict the parent node of w t , we define a matching function between w t and the candidates of the parent node as m (t, j) = h…”
Section: Syntactic Task: Dependency Parsingmentioning
confidence: 99%
See 1 more Smart Citation
“…Then a dependency label is predicted for each child-parent pair. This approach is related to Dozat and Manning (2017) and Zhang et al (2017), where the main difference is that our model works on a multi-task framework. To predict the parent node of w t , we define a matching function between w t and the candidates of the parent node as m (t, j) = h…”
Section: Syntactic Task: Dependency Parsingmentioning
confidence: 99%
“…Both of and Tai et al (2015) explicitly used syntactic trees, and relied on attention mechanisms. However, our method uses the simple maxpooling strategy, which suggests that it is worth Single JMT all JMTAB JMTABC JMTDE JMTCD JMTCE A ↑ POS 97.45 97.55 97.52 97.54 n/a n/a n/a B ↑ Chunking 95.02 n/a 95.77 n/a n/a n/a n/a 97.78 Kumar et al (2016) 97.56 Ma and Hovy (2016) 97.55 Søgaard (2011) 97.50 Collobert et al (2011) 97.29 Tsuruoka et al (2011) 97.28 Toutanova et al (2003) 97.27 Søgaard and Goldberg (2016) 95.56 Suzuki and Isozaki (2008) 95.15 Collobert et al (2011) 94.32 Kudo and Matsumoto (2001) 93.91 Tsuruoka et al (2011) 93.81 Dozat and Manning (2017) 95.74 94.08 Andor et al (2016) 94.61 92.79 94.23 92.36 Zhang et al (2017) 94.10 91.90 93.99 92.05 93.10 90.90 Bohnet (2010) 92.88 90.71 Table 4: Dependency results. 0.243 Tai et al (2015) 0.253 Table 7: Effectiveness of the Shortcut Connections (SC) and the Label Embeddings (LE).…”
Section: Pos Taggingmentioning
confidence: 99%
“…Following Chen and Manning (2014); Dyer et al (2015), we used 300-dimensional pretrained GloVe vectors (Pennington et al, 2014) to initialize our word embedding matrix. Other model parameters are initialized using a normal distribution with a mean of 0 and a variance of (Zhang and Nivre, 2011); C&M14 (Chen and Manning, 2014); ConBSO (Wiseman and Rush, 2016); Dyer15 (Dyer et al, 2015); Weiss15 (Weiss et al, 2015); K&G16 (Kiperwasser and Goldberg, 2016); DENSE (Zhang et al, 2017).…”
Section: Setupmentioning
confidence: 99%
“…This poses a problem for most statistical parsers. In our work, we focus on recent neural dependency parsers which, instead of using hand-crafted feature templates, directly learn the features from the training data (Chen and Manning, 2014;Zhang et al, 2017). These parsers usually introduce an UNKNOWN token for out-ofvocabulary words.…”
Section: The Problem With Compoundsmentioning
confidence: 99%
“…Our parsing model is an extension of the headselection parser of Zhang et al (2017) (figure 1). Given the sentence S = (w 0 , w 1 , ..., w N ) and x i as the input representation of word w i , the model …”
Section: Parsing Modelmentioning
confidence: 99%