An empirical study on about 1.7 million dictionary words from seven languages viz. English, French, Dutch, Spanish, Italian, Hindi, and German has been conducted. Three intriguing characteristic features have been analyzed. First, the alphabet usage pattern in a language was determined which can be used to give an idea on how alphabets have been employed. For instance, the alphabet ‘e’ is highly used in English, while ‘q’ is least used. Second, the average and range of word lengths in the languages were computed and seen to vary from 1 to 37. Average word lengths were computed in the range (6.665–11.14). For comparison, word lengths have been fitted using Gaussian distribution. Third, a new measure was derived; which we termed ‘Language Sparsity’; computed as one minus ratio of number of words of a particular length already existing to the total number of possible words that can be formed. Sparsity hence gives a measure of the scope of fruition in languages. Two such measures have been defined: a weighted and a nonweighted sparsity. Nonweighted sparsity was found to be minimum (0.877) for English and maximum (0.982) for Dutch. The results obtained can play a significant role in propagating the synergy of language evolution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.