1999
DOI: 10.1017/s1351324900002308
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating two methods for Treebank grammar compaction

Abstract: Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be 'read off' the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar. In this paper, we explore ways by which a treebank grammar can be reduced in size or 'compacted', which involve the use… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2001
2001
2016
2016

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…To integrate and experiment with a PCFG backbone we developed initial PCFG extraction tools that can process treebanks of the Penn Treebank format (Marcus et al, 1993). For languages that have existing treebanks such as English and Mandarin Chinese, 17 we can generate a PCFG and compact the rules for the parser (Krotov et al, 1999).…”
Section: Development Planmentioning
confidence: 99%
“…To integrate and experiment with a PCFG backbone we developed initial PCFG extraction tools that can process treebanks of the Penn Treebank format (Marcus et al, 1993). For languages that have existing treebanks such as English and Mandarin Chinese, 17 we can generate a PCFG and compact the rules for the parser (Krotov et al, 1999).…”
Section: Development Planmentioning
confidence: 99%
“…the rate at which new tags are discovered as more text is processed was examined (see e.g. Krotov et al (1999) for further explanation of accession rates). One might expect that as more text is processed, the number of new tags added per text will be smaller.…”
Section: How Much Training Data Is Necessary?mentioning
confidence: 99%