Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.300
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Comparison of Unsupervised Constituency Parsing Methods

Abstract: Unsupervised constituency parsing aims to learn a constituency parser from a training corpus without parse tree annotations. While many methods have been proposed to tackle the problem, including statistical and neural methods, their experimental results are often not directly comparable due to discrepancies in datasets, data preprocessing, lexicalization, and evaluation metrics. In this paper, we first examine experimental settings used in previous work and propose to standardize the settings for better compa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 15 publications
(17 citation statements)
references
References 15 publications
0
12
0
Order By: Relevance
“…Early work in unsupervised PCFG induction from raw text (Johnson et al, 2007;Liang et al, 2009;Tu, 2012) was not as successful as models of unsupervised constituency parsing (Seginer, 2007;Ponvert et al, 2011). However, recent work from unsupervised parsing (Shen et al, 2019;Drozdov et al, 2019Drozdov et al, , 2020 and grammar induction (Jin et al, 2018a(Jin et al, , 2019Zhu et al, 2020;Jin and Schuler, 2020;Li et al, 2020) shows much improvement over previous results with grammars learned solely from raw text, indicating that statistical regularities relevant to syntactic acquisition can be found in word collocations. For example, propose a word-based neural compound PCFG induction model for accurate grammar induction on English.…”
Section: Related Workmentioning
confidence: 83%
“…Early work in unsupervised PCFG induction from raw text (Johnson et al, 2007;Liang et al, 2009;Tu, 2012) was not as successful as models of unsupervised constituency parsing (Seginer, 2007;Ponvert et al, 2011). However, recent work from unsupervised parsing (Shen et al, 2019;Drozdov et al, 2019Drozdov et al, , 2020 and grammar induction (Jin et al, 2018a(Jin et al, , 2019Zhu et al, 2020;Jin and Schuler, 2020;Li et al, 2020) shows much improvement over previous results with grammars learned solely from raw text, indicating that statistical regularities relevant to syntactic acquisition can be found in word collocations. For example, propose a word-based neural compound PCFG induction model for accurate grammar induction on English.…”
Section: Related Workmentioning
confidence: 83%
“…We report F1 scores on test sentences of length ≤ 10 and of all lengths. For the performance of the original DIORA, we rerun the experiments with the hyper-parameters provided by (Li et al, 2020). Since the predicted parse tree is binary, we also provide the upper bound of F1 scores without tree binarization for each dataset.…”
Section: Resultsmentioning
confidence: 99%
“…Following the settings of (Li et al, 2020), we preprocessed the corpora. For punctuation marks, for each language we run two experiments, one with punctuation and one without.…”
Section: Datasets and Settingmentioning
confidence: 99%
“…Following the recommendations put forth by previous work that has done a comprehensive empirical evaluation on this topic (Li et al, 2020b), we report results on both length ≤ 10 as well as all-length test data.…”
Section: Discussionmentioning
confidence: 99%