2005
DOI: 10.1080/09296170500172478
|View full text |Cite
|
Sign up to set email alerts
|

Basic Quantitative Characteristics of the Modern Greek Language Using the Hellenic National Corpus

Abstract: ModernGreek is one of the least quantitatively studied modern European languages and the goal of this paper is to fill this relative void. We use the Hellenic National Corpus (HNC), which is a growing corpus that currently includes 33 million words. The corpus and all the tools used in our work were developed by the Institute for Language and Speech Processing (ILSP). In this paper we focus on three main areas: the lists of the 1000 most common words and lemmas, word length and letter frequency. We also make s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

3
10
0

Year Published

2008
2008
2017
2017

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(13 citation statements)
references
References 13 publications
3
10
0
Order By: Relevance
“…However, about 50% of the summed word frequencies were accounted for by words with 4 or fewer letters, leading to the conclusion that the distribution of word length in Greek is compatible with Zipf 's (1949) "principle of least effort." The average word in terms of frequency was found to be 5.7 letters long, comparable to that reported for the HNC (Hatzigeorgiou et al, 2001;Mikros, Hatzigeorgiou, & Karayiannis, 2005).…”
supporting
confidence: 68%
“…However, about 50% of the summed word frequencies were accounted for by words with 4 or fewer letters, leading to the conclusion that the distribution of word length in Greek is compatible with Zipf 's (1949) "principle of least effort." The average word in terms of frequency was found to be 5.7 letters long, comparable to that reported for the HNC (Hatzigeorgiou et al, 2001;Mikros, Hatzigeorgiou, & Karayiannis, 2005).…”
supporting
confidence: 68%
“…So far, the only published frequency norms for Modern Greek correspond to GreekLex, a database recently created by Ktori et al (2008) containing more than 35,000 different entries taken from a 47 million-word corpus of written texts (Hellenic National Corpus; HNC, Mikros et al, 2001; Hatzigeorgiu et al, 2005). The earliest sources on which the HNC is based date from 1990 and were mainly gathered from newspapers (61.3%), books (9.4%) periodicals (5.9%) and other written texts (23.1%) covering a relatively large variety of topics.…”
Section: Introductionmentioning
confidence: 99%
“…Comparison of different truly human languages arising from apparently different origins or containing different signs has also been made, e.g. beside english, one can find references about greek [20,21,22], turkish [23], chinese [24], ... "Linguistic time series" have often studied at a letter or word level [25,26,27,28] or as in Montemurro and Pury [27,28] at a frequency mapping, similar though not identical to the one described below. Others have considered Zipf law(s) at the sentence level [29,30], -a few sometimes strangely neglecting the punctuation [31,32].…”
Section: Introductionmentioning
confidence: 99%