Corpus-based studies have become increasingly common in linguistic typology over recent years, amounting to the emergence of a new field that we call corpus-based typology. The core idea of corpus-based typology is to take languages as populations of utterances and to systematically investigate text production across languages in this sense. From a usage-based perspective, investigations of variation and preferences of use are at the core of understanding the distribution of conventionalized structures and their diachronic development across languages. Specific findings of corpus-based typological studies pertain to universals of text production, for example, in prosodic partitioning; to cognitive biases constraining diverse patterns of use, for example, in constituent order; and to correlations of diverse patterns of use with language-specific structures and conventions. We also consider remaining challenges for corpus-based typology, in particular the development of crosslinguistically more representative corpora that include spoken (or signed) texts, and its vast potential in the future.
The introduction of new referents into discourse has traditionally been regarded as a major challenge to language processing, for which speakers deploy specific syntactic configurations, guided by the speaker’s assessment of the recipient’s state of mind (‘recipient design’). In this paper we probe these assumptions against discourse data from nine languages. We find little evidence for specialized syntactic configurations accommodating new referents; the only notable exception is the association of new reference with direct objects, suggests that linking new referents to already established discourse frames through a transitive construction is preferable to isolating them in an intransitive one. Where specific intransitive predicates are indeed found to host new referents, we find this to be motivated primarily by semantic considerations. Contrary to long-held assumptions, we conclude that the cognitive challenge of referent introduction is only weakly reflected in morphosyntax; instead, discourse production is most efficient when new referents are integrated seamlessly with content-driven demands of the narration.
It has been argued that speakers employ morphosyntactic structures such as presentationals and left-dislocations (Lambrecht 1994) to establish new entities in discourse due to considerations of referent accessibility vis-à-vis event processing (Du Bois 1987; Chafe 1987). We here investigate whether introductions are sensitive to the salience of the discourse referent in subsequent discourse (Himmelmann 1996; Lichtenberk 1996). This hypothesis is tested against spoken corpus data from twelve diverse languages. While the use of specific morphosyntactic structures does correlate with discourse prominence, humanness has a much stronger effect. Subsequent discourse salience is hence not the chief determinant of the syntactic positions of new mentions; the convergence of humanness and semantic role associations in specific syntactic positions better explains the attested patterns.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.