The Entropy of Words—Learnability and Expressivity across More than 1000 Languages

Bentz, Christian; Alikaniotis, Dimitrios; Cysouw, Michael; Ferrer-i-Cancho, Ramon

doi:10.3390/e19060275

Cited by 87 publications

(109 citation statements)

References 61 publications

Supporting

Mentioning

106

Contrasting

Unclassified

Order By: Relevance

“…For technical reasons, we are limited to calculating mutual information based on the joint frequencies of part-of-speech pairs, rather than wordforms. The reason we use part-of-speech tags is that getting a reliable estimate of mutual information from observed frequencies of wordforms is statistically difficult, requiring very large samples to overcome bias (Archer, Park, & Pillow, 2013;Basharin, 1959;Bentz, Alikaniotis, Cysouw, & Ferrer-i-Cancho, 2017;Futrell et al, 2019;Miller, 1955;Paninski, 2003). The mutual information estimation problem is less severe, however, when we are looking at joint counts over coarser-grained categories, such that there is not a long tail of one-off forms.…”

Section: Word Order Preferencesmentioning

confidence: 99%

Lossy‐Context Surprisal: An Information‐Theoretic Model of Memory Effects in Sentence Processing

2020

View full text Add to dashboard Cite

A key component of research on human sentence processing is to characterize the processing difficulty associated with the comprehension of words in context. Models that explain and predict this difficulty can be broadly divided into two kinds, expectation‐based and memory‐based. In this work, we present a new model of incremental sentence processing difficulty that unifies and extends key features of both kinds of models. Our model, lossy‐context surprisal, holds that the processing difficulty at a word in context is proportional to the surprisal of the word given a lossy memory representation of the context—that is, a memory representation that does not contain complete information about previous words. We show that this model provides an intuitive explanation for an outstanding puzzle involving interactions of memory and expectations: language‐dependent structural forgetting, where the effects of memory on sentence processing appear to be moderated by language statistics. Furthermore, we demonstrate that dependency locality effects, a signature prediction of memory‐based theories, can be derived from lossy‐context surprisal as a special case of a novel, more general principle called information locality.

show abstract

Section: Word Order Preferencesmentioning

confidence: 99%

Lossy‐Context Surprisal: An Information‐Theoretic Model of Memory Effects in Sentence Processing

2020

View full text Add to dashboard Cite

show abstract

“…Supporting this idea, studies on the developing Nicaraguan sign language have shown that complex linguistic structure emerges over multiple cohorts of learners (Senghas, Kita, & Ozyurek, 2004), and work on pidgins has suggested that new child learners are required in order to develop recursion (Bickerton, 1984). Second, it affects the reasoning and predictions made about the structure of human lexicons over time: from understanding trends in metaphorical mappings (Xu, Malt, & Srinivasan, 2017) to measuring the entropy and informativity of words (Bentz, Alikaniotis, Cysouw, & Ferrer-i-Cancho, 2017). Going beyond language evolution and change, this conclusion has already influenced work on a wide range of human behaviors.…”

Section: Introductionmentioning

confidence: 99%

Compositional structure can emerge without generational transmission

2019

View full text Add to dashboard Cite

Experimental work in the field of language evolution has shown that novel signal systems become more structured over time. In a recent paper, Kirby, Tamariz, Cornish, and Smith (2015) argued that compositional languages can emerge only when languages are transmitted across multiple generations. In the current paper, we show that compositional languages can emerge in a closed community within a single generation. We conducted a communication experiment in which we tested the emergence of linguistic structure in different micro-societies of four participants, who interacted in alternating dyads using an artificial language to refer to novel meanings. Importantly, the communication included two real-world aspects of language acquisition and use, which introduce compressibility pressures: (a) multiple interaction partners and (b) an expanding meaning space. Our results show that languages become significantly more structured over time, with participants converging on shared, stable, and compositional lexicons. These findings indicate that new learners are not necessary for the formation of linguistic structure within a community, and have implications for related fields such as developing sign languages and creoles.

show abstract

“…So, first of all, it is necessary to prove that the text corresponds to this style. To do this, it is necessary to determine the unconditional semantic entropy (5) for the detection of journalistic style signs, using the technique [16], which will also be necessary in determining the n-grams [17] when forming a model of the propagandist's psycholinguistic portrait.…”

Section: Stage 2 a Typical Psycholinguistic Profile Constructingmentioning

confidence: 99%

The Quantum-Semantic Psycholinguistic Analysis Method for the English-Language Text of Propaganda Discourse

Тарасенко

2019

A.I.S.

View full text Add to dashboard Cite

Ab s t r a c t. The actual scientific problem of increasing the effectiveness of counteracting the impact of information propaganda on the basis of English-language texts is solved in the article by creating the quantum-semantic psycholinguistic analysis method for the English-language text of propaganda discourse. The subject of the study deals with the methods of psycholinguistic analysis, as well as the methods of providing quantum-semantic text analysis. The method consists of five stages and includes the basic psycholinguistic properties definition, a typical psycholinguistic profile constructing, identifying the manipulative features of the text and correlating them with the psycholinguistic profile, the technological strategy of information warfare determination and forming a model of propagandist's psycholinguistic portrait on the basis of quantum-semantic analysis, which allows determining the features of text perception by using n-grams. At the same time, the approach of automated discourse analysis based on the advanced ID3 algorithm with the use of intensional logic was improved. The psycholinguistic peculiarities of text written exactly by the propagandist are determined by means of carrying out the first two method's stages through the solvation of the semantic particle determination inverted problem, based on the informant's survey method. The manipulative features detection in the psycholinguistic portrait is carried out on the basis of the psychological influence traces study in the text through analyzing the English manipulative constructs and stylistic-semantic entropy, which, allows detecting manipulative deviations in text after correlating the set of semantic particles with a typical psycholinguistic profile. The results of the study provide a propagandists' psycholinguistic portrait definition in order to carry out the further actions of reverse targeted impact on him, taking into account his psychological characteristics for reducing the subconscious resistance and thus increasing the effectiveness of counter-propaganda. This will increase the level of the state's information security.K e ywor d s : psycholinguistic analysis; information propaganda counteraction; quantum-semantic research; propaganda discourse; psychological influence detection; the model of psycholinguistic portrait.

show abstract

The Entropy of Words—Learnability and Expressivity across More than 1000 Languages

Cited by 87 publications

References 61 publications

Lossy‐Context Surprisal: An Information‐Theoretic Model of Memory Effects in Sentence Processing

Lossy‐Context Surprisal: An Information‐Theoretic Model of Memory Effects in Sentence Processing

Compositional structure can emerge without generational transmission

The Quantum-Semantic Psycholinguistic Analysis Method for the English-Language Text of Propaganda Discourse

Contact Info

Product

Resources

About