2007
DOI: 10.1075/ijcl.12.3.03san
|View full text |Cite
|
Sign up to set email alerts
|

The corpus, its users and their needs

Abstract: COMPARA is a bidirectional parallel corpus of English and Portuguese, currently with 3 million words. The corpus was launched in 2000 and at present it is possibly the largest edited parallel corpus publicly available on the Web, with roughly 6,000 corpus queries per month. This paper summarizes an analysis of six years of corpus use. We begin by looking at user studies for language resources, especially corpora, and then we provide a snapshot of COMPARA's users and their behaviour based on log analysis. Parti… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
9
0
1

Year Published

2010
2010
2023
2023

Publication Types

Select...
3
1
1

Relationship

2
3

Authors

Journals

citations
Cited by 12 publications
(10 citation statements)
references
References 13 publications
(11 reference statements)
0
9
0
1
Order By: Relevance
“…Some of the difficulties encountered by novice corpus users in general are, however, described by Bernardini (2000), Kennedy & Miceli (2001), Frankenberg-Garcia (2005, Bianchi & Manca (2006) and Santos & Frankenberg-Garcia (2007). Although these studies differ quite substantially from one to another, they all converge to suggest that corpus skills that come as second nature to experts are not at all obvious to the untrained.…”
Section: Novice Corpus Usersmentioning
confidence: 76%
See 4 more Smart Citations
“…Some of the difficulties encountered by novice corpus users in general are, however, described by Bernardini (2000), Kennedy & Miceli (2001), Frankenberg-Garcia (2005, Bianchi & Manca (2006) and Santos & Frankenberg-Garcia (2007). Although these studies differ quite substantially from one to another, they all converge to suggest that corpus skills that come as second nature to experts are not at all obvious to the untrained.…”
Section: Novice Corpus Usersmentioning
confidence: 76%
“…For example, in the same study of log files from the COMPARA corpus (Santos & Frankenberg-Garcia 2007), we found records of queries reflecting serious misconceptions about the kind of information that can be retrieved from a corpus, including queries as absurd as the string this still did not give me the happiness I thought it would or for which I sought. Logs with queries such as water shining, bill quantities and like a manor also suggest that people who are new to corpora have very little idea of the way chunks of words behave.…”
Section: Novice Corpus Usersmentioning
confidence: 99%
See 3 more Smart Citations