An Empirical Comparison of Unsupervised Constituency Parsing Methods

Li, Jun; Cao, Yifan; Cai, Jiong; Jiang, Yong; Tu, Kewei

doi:10.18653/v1/2020.acl-main.300

Cited by 15 publications

(17 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Early work in unsupervised PCFG induction from raw text (Johnson et al, 2007;Liang et al, 2009;Tu, 2012) was not as successful as models of unsupervised constituency parsing (Seginer, 2007;Ponvert et al, 2011). However, recent work from unsupervised parsing (Shen et al, 2019;Drozdov et al, 2019Drozdov et al, , 2020 and grammar induction (Jin et al, 2018a(Jin et al, , 2019Zhu et al, 2020;Jin and Schuler, 2020;Li et al, 2020) shows much improvement over previous results with grammars learned solely from raw text, indicating that statistical regularities relevant to syntactic acquisition can be found in word collocations. For example, propose a word-based neural compound PCFG induction model for accurate grammar induction on English.…”

Section: Related Workmentioning

confidence: 83%

Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages

Jin

Oh²,

Schuler

2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

Unsupervised PCFG induction models, which build syntactic structures from raw text, can be used to evaluate the extent to which syntactic knowledge can be acquired from distributional information alone. However, many state-of-the-art PCFG induction models are word-based, meaning that they cannot directly inspect functional affixes, which may provide crucial information for syntactic acquisition in child learners. This work first introduces a neural PCFG induction model that allows a clean ablation of the influence of subword information in grammar induction. Experiments on child-directed speech demonstrate first that the incorporation of subword information results in more accurate grammars with categories that word-based induction models have difficulty finding, and second that this effect is amplified in morphologically richer languages that rely on functional affixes to express grammatical relations. A subsequent evaluation on multilingual treebanks shows that the model with subword information achieves state-ofthe-art results on many languages, further supporting a distributional model of syntactic acquisition.

show abstract

Section: Related Workmentioning

confidence: 83%

Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages

Jin

Oh²,

Schuler

2021

Findings of the Association for Computational Linguistics: EMNLP 2021

View full text Add to dashboard Cite

show abstract

“…We report F1 scores on test sentences of length ≤ 10 and of all lengths. For the performance of the original DIORA, we rerun the experiments with the hyper-parameters provided by (Li et al, 2020). Since the predicted parse tree is binary, we also provide the upper bound of F1 scores without tree binarization for each dataset.…”

Section: Resultsmentioning

confidence: 99%

“…Following the settings of (Li et al, 2020), we preprocessed the corpora. For punctuation marks, for each language we run two experiments, one with punctuation and one without.…”

Section: Datasets and Settingmentioning

confidence: 99%

Deep Inside-outside Recursive Autoencoder with All-span Objective

Hong¹,

Cai

2020

Proceedings of the 28th International Conference on Computational Linguistics

Self Cite

View full text Add to dashboard Cite

Deep inside-outside recursive autoencoder (DIORA) is a neural-based model designed for unsupervised constituency parsing. During its forward computation, it provides phrase and contextual representations for all spans in the input sentence. By utilizing the contextual representation of each leaf-level span, the span of length 1, to reconstruct the word inside the span, the model is trained without labeled data. In this work, we extend the training objective of DIORA by making use of all spans instead of only leaf-level spans. We test our new training objective on datasets of two languages: English and Japanese, and empirically show that our method achieves improvement in parsing accuracy over the original DIORA.

show abstract

“…Following the recommendations put forth by previous work that has done a comprehensive empirical evaluation on this topic (Li et al, 2020b), we report results on both length ≤ 10 as well as all-length test data.…”

Section: Discussionmentioning

confidence: 99%

Co-training an Unsupervised Constituency Parser with Weak Supervision

Maveli¹,

Cohen²

2021

Preprint

View full text Add to dashboard Cite

We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F 1 on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results. 1

show abstract

An Empirical Comparison of Unsupervised Constituency Parsing Methods

Cited by 15 publications

References 15 publications

Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages

Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages

Deep Inside-outside Recursive Autoencoder with All-span Objective

Co-training an Unsupervised Constituency Parser with Weak Supervision

Contact Info

Product

Resources

About