Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning 2000
DOI: 10.3115/1117601.1117611
|View full text |Cite
|
Sign up to set email alerts
|

Memory-based learning for article generation

Abstract: Article choice can pose difficult problems in applications such as machine translation and automated summarization. In this paper, we investigate the use of corpus data to collect statistical generalizations about article use in English in order to be able to generate articles automatically to supplement a symbolic generator. We use data from the Penn Treebank as input to a memory-based learner (TiMBL 3.0;Daelemans et al., 2000) which predicts whether to generate an article with respect to an English base noun… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
33
1
1

Year Published

2007
2007
2018
2018

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 31 publications
(36 citation statements)
references
References 9 publications
1
33
1
1
Order By: Relevance
“…Knight and Chander [25] took the first step to using a machine learning algorithm for article generation although their method only deals with a/the selection. Minnen et al [26] extend this work to three-way classification. However, their method also depends on information such as functional tags in Penn Treebank which may not be reliable in essay writing.…”
Section: Relation To Previous Workmentioning
confidence: 99%
“…Knight and Chander [25] took the first step to using a machine learning algorithm for article generation although their method only deals with a/the selection. Minnen et al [26] extend this work to three-way classification. However, their method also depends on information such as functional tags in Penn Treebank which may not be reliable in essay writing.…”
Section: Relation To Previous Workmentioning
confidence: 99%
“…Our approach significantly improves upon the work of Minnen et al (2000). We also use additional automatically parsed data from the North American News Text Corpus (Graff, 1995), further improving our results.…”
Section: Introductionmentioning
confidence: 99%
“…As with (Minnen et al, 2000), we train the language model on the Penn Treebank (Marcus et al, 1993). As far as we know, language modeling always improves with additional training data, so we add data from the North American News Text Corpus (NANC) (Graff, 1995) automatically parsed with the Charniak parser to train our language model on up to 20 million additional words.…”
Section: Training the Modelmentioning
confidence: 99%
“…At the syntactic sentence level TiMBL has been applied to part of speech tagging Zavrel and Daelemans, 1999;Van Halteren, Zavrel, and Daelemans, 2001); PPattachment (Zavrel, Daelemans, and Veenstra, 1997); subcategorization (Buchholz, 1998); phrase chunking (Veenstra, 1998;Tjong Kim Sang and Veenstra, 1999); shallow parsing Buchholz, Veenstra, and Daelemans, 1999;Yeh, 2000); clause identification (Orȃsan, 2000;Tjong Kim Sang, 2001); detecting the scope of negation markers (Morante, Liekens, and Daelemans, 2008); sentence-boundary detection (Stevenson and Gaizauskas, 2000); predicting the order of prenominal adjectives for generation (Malouf, 2000) and article generation (Minnen, Bond, and Copestake, 2000); and, beyond the sentence level, to anaphora resolution (Preiss, 2002;Mitkov, Evans, and Orasan, 2002;Hoste, 2005). More recently, memory-based learning has been integrated as a classifier engine in more complicated dependency parsing systems (Nivre, Hall, and Nilsson, 2004;Sagae and Lavie, 2005;, or dependency parsing in combination with semantic role labeling (Morante, Van Asch, and Van den Bosch, 2009).…”
Section: Nlp Applications Of Timblmentioning
confidence: 99%