1997
DOI: 10.1075/ijcl.2.2.06san
|View full text |Cite
|
Sign up to set email alerts
|

Predictability of Word Forms (Types) and Lemmas in Linguistic Corpora. A Case Study Based on the Analysis of the CUMBRE Corpus

Abstract: Various research centres and publishing companies all around the world have been developing corpus resources for many years, and there has been a growing awareness throughout the eighties of their importance to linguistic and lexicographic work. To give some idea of scale, the British National Corpus contains 100 million words, and its counterpart for Spanish—compiled by the Spanish Real Academia de la Lengua—will reach 100 million words at first and 200 million words in a second stage. However, little convinc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0
2

Year Published

2000
2000
2021
2021

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 25 publications
(7 citation statements)
references
References 0 publications
0
5
0
2
Order By: Relevance
“…billion tokens) is needed, we might put more emphasis on balance and coverage; (c) to induce the relationship between lemmas and tokens, types and tokens, and lemmas and types. The usefulness of these dependencies becomes evident in many ways, in particular whenever we want to contrast regularity patterns regarding specific linguistic areas (e.g., written versus spoken, Spanish versus Korean, newspapers versus fiction, see [13], [20]; and finally (d) to represent any monotone, convex up, increasing curve that cannot be represented by single functions. That is, whenever the error term is too big to be represented by a single function.…”
Section: Discussionmentioning
confidence: 99%
See 4 more Smart Citations
“…billion tokens) is needed, we might put more emphasis on balance and coverage; (c) to induce the relationship between lemmas and tokens, types and tokens, and lemmas and types. The usefulness of these dependencies becomes evident in many ways, in particular whenever we want to contrast regularity patterns regarding specific linguistic areas (e.g., written versus spoken, Spanish versus Korean, newspapers versus fiction, see [13], [20]; and finally (d) to represent any monotone, convex up, increasing curve that cannot be represented by single functions. That is, whenever the error term is too big to be represented by a single function.…”
Section: Discussionmentioning
confidence: 99%
“…This suggests that even if we were to find a function that could fit some given data (corpus), there would still be no guarantee that the function would always hold. This is precisely the common major flaw of Young-Mi Jeong's and Sánchez and Cantos' researches within the field of computational linguistics [12], [13]. In Section III-3, we shall delineate experimentally this problem in more detail.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations