Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics - EACL '03 2003
DOI: 10.3115/1067807.1067851
|View full text |Cite
|
Sign up to set email alerts
|

Bootstrapping statistical parsers from small datasets

Abstract: We present a practical co-training method for bootstrapping statistical parsers using a small amount of manually parsed training material and a much larger pool of raw sentences. Experimental results show that unlabelled sentences can be used to improve the performance of statistical parsers. In addition, we consider the problem of bootstrapping parsers when the manually parsed training material is in a different domain to either the raw sentences or the testing material. We show that bootstrapping continues t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
73
1
2

Year Published

2003
2003
2013
2013

Publication Types

Select...
5
2
2

Relationship

3
6

Authors

Journals

citations
Cited by 94 publications
(77 citation statements)
references
References 9 publications
1
73
1
2
Order By: Relevance
“…However, although co-training has been used in many domains such as statistical parsing and noun phrase identification [22], [29], [33], [38], in most scenarios the requirement of sufficient and redundant views, or even the requirement of sufficient redundancy, could not be met. Therefore, researchers attempt to develop variants of the co-training algorithm for relaxing such a requirement.…”
Section: Semi-supervised Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…However, although co-training has been used in many domains such as statistical parsing and noun phrase identification [22], [29], [33], [38], in most scenarios the requirement of sufficient and redundant views, or even the requirement of sufficient redundancy, could not be met. Therefore, researchers attempt to develop variants of the co-training algorithm for relaxing such a requirement.…”
Section: Semi-supervised Learningmentioning
confidence: 99%
“…This algorithm employs two regressors each of which labels the unlabeled data for the other during the learning process. In order to choose appropriate unlabeled examples to label, COREG estimates the labeling confidence by consulting the influence July 31, 2007 DRAFT of the labeling of unlabeled examples on the labeled examples. The final prediction is made by combining the regression estimates generated by both regressors.…”
Section: Introductionmentioning
confidence: 99%
“…One actively researched approach to this problem is to develop weakly supervised algorithms that require less training data, such as active learning (Hermjakob and Mooney 1997;Tang et al 2002;Baldridge and Osborne 2003;Hwa 2004) and co-training (Sarkar 2001;Steedman et al 2003). In this article, we explore an alternative: using parallel text as a means for transferring syntactic knowledge from a resource-rich language to a language with fewer resources.…”
Section: Introductionmentioning
confidence: 99%
“…Co-training (Blum and Mitchell, 1998), and several variants of co-training, have been applied to a number of NLP problems, including word sense disambiguation (Yarowsky, 1995), named entity recognition (Collins and Singer, 1999), noun phrase bracketing (Pierce and Cardie, 2001) and statistical parsing (Sarkar, 2001;Steedman et al, 2003). In each case, co-training was used successfully to bootstrap a model from only a small amount of labelled data and a much larger pool of unlabelled data.…”
Section: Introductionmentioning
confidence: 99%