Human infants, like immature members of any species, must be highly selective in sampling information from their environment to learn efficiently. Failure to be selective would waste precious computational resources on material that is already known (too simple) or unknowable (too complex). In two experiments with 7- and 8-month-olds, we measure infants’ visual attention to sequences of events varying in complexity, as determined by an ideal learner model. Infants’ probability of looking away was greatest on stimulus items whose complexity (negative log probability) according to the model was either very low or very high. These results suggest a principle of infant attention that may have broad applicability: infants implicitly seek to maintain intermediate rates of information absorption and avoid wasting cognitive resources on overly simple or overly complex events.
Sentence processing theories typically assume that the input to our language processing mechanisms is an error-free sequence of words. However, this assumption is an oversimplification because noise is present in typical language use (for instance, due to a noisy environment, producer errors, or perceiver errors). A complete theory of human sentence comprehension therefore needs to explain how humans understand language given imperfect input. Indeed, like many cognitive systems, language processing mechanisms may even be "well designed"-in this case for the task of recovering intended meaning from noisy utterances. In particular, comprehension mechanisms may be sensitive to the types of information that an idealized statistical comprehender would be sensitive to. Here, we evaluate four predictions about such a rational (Bayesian) noisy-channel language comprehender in a sentence comprehension task: (i) semantic cues should pull sentence interpretation towards plausible meanings, especially if the wording of the more plausible meaning is close to the observed utterance in terms of the number of edits; (ii) this process should asymmetrically treat insertions and deletions due to the Bayesian "size principle"; such nonliteral interpretation of sentences should (iii) increase with the perceived noise rate of the communicative situation and (iv) decrease if semantically anomalous meanings are more likely to be communicated. These predictions are borne out, strongly suggesting that human language relies on rational statistical inference over a noisy channel.T raditionally, models of sentence comprehension have assumed that the input to the sentence comprehension mechanism is an error-free sequence of words (e.g., refs. 1-6). However, the noise inherent in normal language use-due to producer or perceiver errors-makes this assumption an oversimplification. For example, language producers may misspeak or mistype because they do not have a full command of the language (due to being very young or a nonnative speaker, or suffering from a language disorder like aphasia), because they have not fully planned an utterance in advance, or because they are trying to communicate under the influence of stress or confusion. Similarly, language comprehenders may mishear or misread things because of their own handicaps (e.g., poor hearing/sight, or not paying sufficient attention), because the environment is noisy, or because the producer is not making sufficient effort in communicating clearly (e.g., whispering or mumbling, or writing in sloppy handwriting). Given the prevalence of these noise sources, it is plausible that language processing mechanisms are well adapted to handling noisy input, and so a complete model of language comprehension must allow for the existence of noise.Noisy-channel models (7) of speech perception have been prominent in the literature for many years (e.g., 8-11). Furthermore, several researchers have observed the importance of noise in the input for the compositional, syntactic processes of sentence unders...
We demonstrate a substantial improvement on one of the most celebrated empirical laws in the study of language, Zipf's 75-y-old theory that word length is primarily determined by frequency of use. In accord with rational theories of communication, we show across 10 languages that average information content is a much better predictor of word length than frequency. This indicates that human lexicons are efficiently structured for communication by taking into account interword statistical dependencies. Lexical systems result from an optimization of communicative pressures, coding meanings efficiently given the complex statistics of natural language use.information theory | rational analysis O ne widely known and apparently universal property of human language is that frequent words tend to be short. This law was popularized by Harvard linguist George Kingsley Zipf, who observed that "the magnitude of words tends, on the whole, to stand in an inverse (not necessarily proportionate) relationship to the number of occurrences" (1).Zipf theorized that this pattern resulted from a pressure for communicative efficiency. Information can be conveyed as concisely as possible by giving the most frequently used meanings the shortest word forms, much like in variable-length (e.g., Huffman) codes. This strategy provided one key exemplar of Zipf's principle of least effort, a grand "principle that governs our entire individual and collective behavior of all sorts, including the behavior of our language" (2). Zipf's idea of assigning word length by frequency can be maximally concise and efficient if words occur independently from a stationary distribution. However, natural language use is highly nonstationary as word probabilities change depending on their context. A more efficient code for meanings can therefore be constructed by respecting the statistical dependencies between words. Here, we show that human lexical systems are such codes, with word length primarily determined by the average amount of information a word conveys in context. The exact forms of the frequency-length relationship (3, 4) and the distribution of word lengths (5) have been quantitatively evaluated previously. In contrast, information content offers an empirically supported and rationally motived alternative to Zipf's frequency-length relationship.A lexicon that assigns word lengths based on information content differs from Zipf's theory in two key ways. First, such a lexicon would not be the most concise one possible as it would not shorten highly informative words, even if shorter distinctive wordforms were available. Second, unlike Zipf's system, assigning word length based on information content keeps the information rate of communication as constant as possible (6). A tendency for this type of "smoothing out" peaks and dips of informativeness is known as uniform information density and has been observed in choices made during online language production (7-10). Formally, uniform information density holds that language users make choices that keep the...
The frequency distribution of words has been a key object of study in statistical linguistics for the past 70 years. This distribution approximately follows a simple mathematical form known as Zipf ’ s law. This article first shows that human language has a highly complex, reliable structure in the frequency distribution over and above this classic law, although prior data visualization methods have obscured this fact. A number of empirical phenomena related to word frequencies are then reviewed. These facts are chosen to be informative about the mechanisms giving rise to Zipf’s law and are then used to evaluate many of the theoretical explanations of Zipf’s law in language. No prior account straightforwardly explains all the basic facts or is supported with independent evaluation of its underlying assumptions. To make progress at understanding why language obeys Zipf’s law, studies must seek evidence beyond the law itself, testing assumptions and evaluating novel predictions with new, independent data.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.