Proceedings of the 2001 Workshop on Computational Natural Language Learning - ConLL '01 2001
DOI: 10.3115/1117822.1117831
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised induction of stochastic context-free grammars using distributional clustering

Abstract: An algorithm is presented for learning a phrase-structure grammar from tagged text. It clusters sequences of tags together based on local distributional information, and selects clusters that satisfy a novel mutual information criterion. This criterion is shown to be related to the entropy of a random variable associated with the tree structures, and it is demonstrated that it selects linguistically plausible constituents. This is incorporated in a Minimum Description Length algorithm. The evaluation of unsupe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
63
0
1

Year Published

2006
2006
2019
2019

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 106 publications
(64 citation statements)
references
References 11 publications
0
63
0
1
Order By: Relevance
“…Bod 2009 and references therein) and does thus not constrain the computational realization of the statistical induction processes underlying language learning (cf. Clark, 2001;Klein and Manning, 2002;Zuidema, 2006;Bod & Smets, 2012;inter alia). In this paper, we were interested to what extent advanced L2 learners have succeeded in identifying generalizations pertaining to variables that figure in psycholinguistic accounts of sentence-level processing (e.g.…”
Section: Discussionmentioning
confidence: 99%
“…Bod 2009 and references therein) and does thus not constrain the computational realization of the statistical induction processes underlying language learning (cf. Clark, 2001;Klein and Manning, 2002;Zuidema, 2006;Bod & Smets, 2012;inter alia). In this paper, we were interested to what extent advanced L2 learners have succeeded in identifying generalizations pertaining to variables that figure in psycholinguistic accounts of sentence-level processing (e.g.…”
Section: Discussionmentioning
confidence: 99%
“…Grammar induction (Clark, 2001;Klein and Manning, 2002;Klein and Manning, 2004;Haghighi and Klein, 2006;Smith and Eisner, 2006;Snyder et al, 2009, inter alios) involves the learning of grammars from unlabeled sentences. Here, unlabeled means that the sentences are often POS tagged, but no syntactic structures for the sentences are available.…”
Section: Related Workmentioning
confidence: 99%
“…They subsequently cluster syntactic units until the grammar has been constructed. For example, EMILE [1] clusters expressions that occur in the same context, while CDC [10] creates sets of sequences within a context before selecting clusters that satisfy the MDL principle (see above).…”
Section: Grammar Inferencementioning
confidence: 99%
“…The principle finds its primary application in data reduction, where "any regularity in a given set of data can be used to compress the data" [20]. Examples include CDC [10] and e-GRIDS [38]. -Greedy search algorithms make decisions based on their internal logic which may lead to the creation, removal or fusion of rules.…”
Section: Grammar Inferencementioning
confidence: 99%
See 1 more Smart Citation