In this paper we outline a number of issues and problems which arise during
the process of contrastive human-coded corpus annotation of certain semantic
and discourse categories within the framework of the CONTRANOT project,
aimed at the creation and validation of contrastive functional descriptions
through corpus analysis and annotation. Human-coded corpus annotation is a
preliminary step for the training of computer algorithms which allow the automation
of the annotation of large corpora, but it can also serve as a mechanism
for testing aspects of linguistic theories empirically, such as theory formation
and theory-redefinition, as well as enriching theories with quantitative information.
The work reported in this paper focuses on the annotation of the category
of Thematisation, on the one hand, and on Modality, on the other, to illustrate
the challenges researchers have to face when confronted with the task of developing
well-designed and reliable annotation procedures for complex linguistic
phenomena in a contrastive manner. We describe the annotation tasks and
procedures developed so far, which include the design of annotation schemas
on the basis of available linguistic theories and the testing of their reliability
through agreement studies. We also evaluate and discuss the results of the annotations
on the basis of their relevance for the theoretical characterisation of the
investigated phenomena. We expect that our work will have an impact in the
area of contrastive textual analysis, and that it will pave the way for the development
of automated annotation systems for computational applications.
This paper outlines current work on the construction of a high-quality, richly-annotated and register-diversified parallel corpus for the English-Spanish language pair, as currently carried out within the framework of the MULTINOT project. The corpus consists of original and translated texts in both directions and is designed as a multifunctional resource to be used in a number of disciplines such as corpus-based contrastive linguistic and translation studies, machine translation, computer-assisted translation, computer-assisted language learning and terminology extraction. The paper describes the structure of the corpus -which includes four subcorpora: English originals (EO) and Spanish originals (SO), English translations (Etrans) and Spanish translations (Strans)-, the registers selected for inclusion in the corpus, and the methodology used to guarantee the quality of the processing steps to enrich the corpus with linguistic information at different levels.
The purpose of this paper is to analyze how the clausal thematic features observed in two newspaper genres —news reports and commentaries— can be interpreted as textual signals of their different generic characterization. This is done through the qualitative and quantitative analysis of a sample consisting of thirty-three English texts, divided into two groups of seventeen news reports and sixteen commentaries, respectively. The analysis focused on the following thematic features: (1) the experiential elements selected as Thematic Heads; (2) the semantic nature of the nominal elements realizing these Heads and their internal structure; (3) the textual and interpersonal thematic choices as part of a multiple theme. The analysis reveals that each newspaper genre prefers certain thematic features and that the differences between the two genres are statistically significant. It is suggested that these thematic preferences can be attributed to genre-related variables such as the communicative purpose or the subject matter of the text
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.