2003
DOI: 10.1162/089120103322711569
|View full text |Cite
|
Sign up to set email alerts
|

Introduction to the Special Issue on the Web as Corpus

Abstract: The Web, teeming as it is with language data, of all manner of varieties and languages, in vast quantity and freely available, is a fabulous linguists' playground. This special issue of Computational Linguistics explores ways in which this dream is being explored.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
226
0
26

Year Published

2005
2005
2023
2023

Publication Types

Select...
4
3
3

Relationship

0
10

Authors

Journals

citations
Cited by 532 publications
(252 citation statements)
references
References 9 publications
0
226
0
26
Order By: Relevance
“…The world wide web, with its inexhaustible amount of natural language data, has become an established source for efficiently building large corpora (Kilgarriff and Grefenstette, 2003). Tools are available that make it convenient to bootstrap corpora from the web based on mere seed term lists, such as the BootCaT toolkit (Baroni and Bernardini, 2004).…”
Section: Related Workmentioning
confidence: 99%
“…The world wide web, with its inexhaustible amount of natural language data, has become an established source for efficiently building large corpora (Kilgarriff and Grefenstette, 2003). Tools are available that make it convenient to bootstrap corpora from the web based on mere seed term lists, such as the BootCaT toolkit (Baroni and Bernardini, 2004).…”
Section: Related Workmentioning
confidence: 99%
“…However, one of the biggest problems encountered by these approaches is to obtain an amount of data that could be large enough for statistical and linguistic analysis. Taking into account the rapid growth of the Internet and the quantity of texts included in it, some researchers have proposed using the Web as a source for building corpora [7]. Two strategies have been proposed for exploiting the web with that objective in mind:…”
Section: Introductionmentioning
confidence: 99%
“…This paper extends [1] by intensive experimentation with Google as a resource for testing Spanish collocations (for English, the results would be even much more statistically significant). The Web is widely considered now as a huge (but noisy) linguistic resource [3,4]. To use it for malapropism detection and correction, we had to revise the earlier algorithm and to develop new threshold-based procedures.…”
Section: Introductionmentioning
confidence: 99%