The web has unique potential among corpora to yield large-volume data on up-to-date language use, obvious shortcomings notwithstanding. Since 1998, we IntroductionThe Research Unit is a multi-disciplinary team of linguists, software engineers and statisticians which works to understand and describe language in use, and to apply this knowledge. The language in question has primarily been English, and the applications have primarily been in the fields of information extraction, retrieval and management, but we are also mindful of the needs of linguistic researchers, language teachers and learners, both in English and in other languages. We regard language is a changing phenomenon, and we thus began early on to build systems to accumulate and process journalistic text chronologically, to complement existing finite, synchronic corpora. When web text emerged in the nineties, we had been analysing evolving, particularly neologistic, language use in very large textual databases for almost a decade. We were thus well placed to appreciate the advantage of web-based text over the increasingly historical entities which stand as representatives of 'current English' -web text would allow the fine-tuning of the picture of what is current usage, providing access to aspects and domains of language which were missing from corpora. Web text presented a serendipitous opportunity, and its many well-rehearsed shortcomings were outweighed by the advantages it offered of access to free, plentiful, up-dated and up-to-date data.
This study addresses a familiar challenge in corpus pragmatic research: the search for functional phenomena in large electronic corpora. Speech acts are one area of research that falls into this functional domain and the question of how to identify them in corpora has occupied researchers over the past 20 years. This study focuses on apologies as a speech act that is characterised by a standard set of routine expressions, making it easier to search for with corpus linguistic tools. Nevertheless, even for a comparatively formulaic speech act, such as apologies, the polysemous nature of forms (cf. e.g. I am sorry vs. a sorry state) impacts the precision of the search output so that previous studies of smaller data samples had to resort to manual microanalysis. In this study, we introduce an innovative methodological approach that demonstrates how the combination of different types of collocational analysis can facilitate the study of speech acts in larger corpora. By first establishing a collocational profile for each of the Illocutionary Force Indicating Devices associated with apologies and then scrutinising their shared and unique collocates, unwanted hits can be discarded and the amount of manual intervention reduced. Thus, this article introduces new possibilities in the field of corpus-based speech act analysis and encourages the study of pragmatic phenomena in large corpora.
This chapter studies the form oops and its function as an Illocutionary Force Indicating Device (IFID) signalling apologies in a corpus of blog posts and reader comments. The focus is on the adaptability of speech acts to online media and the implications for the formal choice of linguistic expressions beyond the prototypical examples of routinised apology IFIDs. Thus, this study takes a closer look at the pragmatic functions of oops in the Birmingham Blog Corpus, a diachronically-structured collection covering the period 2000-2010, to gain new insights into its use and distribution.
The WebCorp project has demonstrated how the Web may be used as a source of linguistic data. One feature of standard corpus analysis tools hitherto missing in WebCorp is the ability to filter and sort results by date. This paper discusses the dating mechanisms available on the Web and the date query facilities offered by standard Web search engines. The new date heuristics built into WebCorp are then discussed and illustrated with a case study.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.