In this paper we outline a number of issues and problems which arise during the process of contrastive human-coded corpus annotation of certain semantic and discourse categories within the framework of the CONTRANOT project, aimed at the creation and validation of contrastive functional descriptions through corpus analysis and annotation. Human-coded corpus annotation is a preliminary step for the training of computer algorithms which allow the automation of the annotation of large corpora, but it can also serve as a mechanism for testing aspects of linguistic theories empirically, such as theory formation and theory-redefinition, as well as enriching theories with quantitative information. The work reported in this paper focuses on the annotation of the category of Thematisation, on the one hand, and on Modality, on the other, to illustrate the challenges researchers have to face when confronted with the task of developing well-designed and reliable annotation procedures for complex linguistic phenomena in a contrastive manner. We describe the annotation tasks and procedures developed so far, which include the design of annotation schemas on the basis of available linguistic theories and the testing of their reliability through agreement studies. We also evaluate and discuss the results of the annotations on the basis of their relevance for the theoretical characterisation of the investigated phenomena. We expect that our work will have an impact in the area of contrastive textual analysis, and that it will pave the way for the development of automated annotation systems for computational applications.
The purpose of this paper is to analyze how the clausal thematic features observed in two newspaper genres —news reports and commentaries— can be interpreted as textual signals of their different generic characterization. This is done through the qualitative and quantitative analysis of a sample consisting of thirty-three English texts, divided into two groups of seventeen news reports and sixteen commentaries, respectively. The analysis focused on the following thematic features: (1) the experiential elements selected as Thematic Heads; (2) the semantic nature of the nominal elements realizing these Heads and their internal structure; (3) the textual and interpersonal thematic choices as part of a multiple theme. The analysis reveals that each newspaper genre prefers certain thematic features and that the differences between the two genres are statistically significant. It is suggested that these thematic preferences can be attributed to genre-related variables such as the communicative purpose or the subject matter of the text
This chapter reports on the contrastive analysis of interpersonal discourse markers (IDMs) in a sample of English and Spanish newspaper texts in three genres: news reports, editorials and letters to the editor. The sample was divided into a training dataset of eighteen (English-Spanish) comparable texts and a larger dataset of 220 texts, divided into 60 news reports, 60 editorials and 100 letters to the editor. Following the methodology of Hovy & Lavid (2010), we present a preliminary annotation scheme validated by an inter-annotation agreement study. We then present the results of annotating the larger dataset, which reveals genre-related and language-specific variation in the distribution of IDMs in these newspaper genres. We discuss and provide some possible explanations for the results obtained.
In this paper we present the preliminary results of an empirical study designed to test contrastive features of the category of Theme in English and Spanish through corpus analysis and manual annotation. Using as our theoretical basis the more general features of the model of thematisation proposed in Lavid, Arús and Zamorano (2010), the study describes the different steps of the methodology used, starting with the selection of the corpus used as a ‘training suite’, followed by the design of the annotation scheme, and ending with a discussion of the results of two annotation experiments carried out so far to test the reproducibility of the annotation scheme. It is expected that the work reported in this paper has a theoretical impact on the area of contrastive corpus studies and serves as the basis for the (semi)-automatic annotation of thematic features in larger bilingual corpora.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.