Enriched In-Order Linearization for Faster Sequence-to-Sequence Constituent Parsing

Fernández-González, Daniel; Gómez-Rodríguez, Carlos

doi:10.18653/v1/2020.acl-main.376

Cited by 10 publications

(11 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this respect, a practical characteristic of sequence labeling approaches to parsing is that they are more efficient than seq2seq models. For example, the single-core speeds of the seq2seq constituent parsers of Fernández-González and Gómez-Rodríguez [143], albeit optimized for speed, are an order of magnitude slower than those of sequence labeling constituent parsers [121,122]. This is compounded by the fact that sequence labeling is much easier to parallelize, so that the differences can be even larger in multi-core settings.…”

Section: Discussionmentioning

confidence: 99%

On the Use of Parsing for Named Entity Recognition

2021

Self Cite

View full text Add to dashboard Cite

Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic information; and we propose a new way of using parsing in NER based on casting parsing itself as a sequence labeling task.

show abstract

Section: Discussionmentioning

confidence: 99%

On the Use of Parsing for Named Entity Recognition

2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, parsers of this type suffer from the exposure bias during inference. Beside these methods, Seq2Seq models have been used to generate a linearized form of the tree (Vinyals et al, 2015b;Kamigaito et al, 2017;Suzuki et al, 2018;Fernández-González and Gómez-Rodríguez, 2020a). However, these methods may generate invalid trees when the open and end brackets do not match.…”

Section: Related Workmentioning

confidence: 99%

A Conditional Splitting Framework for Efficient Constituency Parsing

Nguyen¹,

Nguyen²,

Joty³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

We introduce a generic seq2seq parsing framework that casts constituency parsing problems (syntactic and discourse parsing) into a series of conditional splitting decisions. Our parsing model estimates the conditional probability distribution of possible splitting points in a given text span and supports efficient topdown decoding, which is linear in number of nodes. The conditional splitting formulation together with efficient beam search inference facilitate structural consistency without relying on expensive structured inference. Crucially, for discourse analysis we show that in our formulation, discourse segmentation can be framed as a special case of parsing which allows us to perform discourse parsing without requiring segmentation as a pre-requisite. Experiments show that our model achieves good results on the standard syntactic parsing tasks under settings with/without pre-trained representations and rivals state-of-the-art (SoTA) methods that are more computationally expensive than ours. In discourse parsing, our method outperforms SoTA by a good margin.

show abstract

“…In fact, they lagged behind classic parsers based on explicit tree-structured algorithms and supported by a more extensive research background. The gap between task-specific constituent parsers and sequence-to-sequence models cannot be only quantified in terms of accuracy and speed (Fernández-González and Gómez-Rodríguez, 2020b), but also in coverage: to the best of our knowledge, the latter have not been applied to discontinuous constituent parsing to date.…”

Section: Introductionmentioning

confidence: 99%

“…• The implementation of a novel sequence-to-sequence constituent parser, 1 building on the work developed by Fernández-González and Gómez-Rodríguez (2020b) and Fernandez Astudillo, Ballesteros, Naseem, Blodgett and Florian (2020). While the former defines linearizations for continuous parsing that outperform those previously proposed, the latter introduces a deterministic attention technique over a powerful Transformer sequence-to-sequence architecture (Ott, Edunov, Baevski, Fan, Gross, Ng, Grangier and Auli, 2019) that significantly increases prediction accuracy.…”

Section: Introductionmentioning

confidence: 99%

Discontinuous Grammar as a Foreign Language

Fernández-González¹,

Gómez-Rodríguez²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

In order to achieve deep natural language understanding, syntactic constituent parsing is a vital step, highly demanded by many artificial intelligence systems to process both text and speech. One of the most recent proposals is the use of standard sequence-to-sequence models to perform constituent parsing as a machine translation task, instead of applying task-specific parsers. While they show a competitive performance, these text-to-parse transducers are still lagging behind classic techniques in terms of accuracy, coverage and speed. To close the gap, we here extend the framework of sequence-to-sequence models for constituent parsing, not only by providing a more powerful neural architecture for improving their performance, but also by enlarging their coverage to handle the most complex syntactic phenomena: discontinuous structures. To that end, we design several novel linearizations that can fully produce discontinuities and, for the first time, we test a sequence-to-sequence model on the main discontinuous benchmarks, obtaining competitive results on par with task-specific discontinuous constituent parsers and achieving state-of-the-art scores on the (discontinuous) English Penn Treebank.

show abstract

Enriched In-Order Linearization for Faster Sequence-to-Sequence Constituent Parsing

Cited by 10 publications

References 19 publications

On the Use of Parsing for Named Entity Recognition

On the Use of Parsing for Named Entity Recognition

A Conditional Splitting Framework for Efficient Constituency Parsing

Discontinuous Grammar as a Foreign Language

Contact Info

Product

Resources

About