Encoding document information in a corpus of student writing: the British Academic Written English corpus

Ebeling, Signe Oksefjell; Heuboeck, Alois

doi:10.3366/cor.2007.2.2.241

Cited by 16 publications

(10 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The word counts given in Table 1 are exclusive of the ignored material. See Ebeling and Heuboeck (2007) and the respective corpus manuals (Paquot et al 2010, Heuboeck, Holmes, andNesi 2008) for more information regarding the annotation.…”

Section: Methodsmentioning

confidence: 99%

Phraseological teddy bears

Hasselgård¹

2019

Corpus Linguistics, Context and Culture

View full text Add to dashboard Cite

This paper compares frequent four-word lexical bundles in a learner corpus (VESPA) and a native speaker corpus (BAWE), both representing novice academic writing. The frequencies and dispersion of bundles in the two corpora reveal patterns of both over-and underuse among the learners. The learners are shown to use some bundles very frequently, but frequencies drop more sharply than in the native corpus. The dispersion of the frequent bundles tends to be broader in the native speaker corpus. In a closer scrutiny of four selected bundles the noviceexpert dimension is addressed by consulting a corpus of published research articles. Contrasts between English and Norwegian are also considered in order to explain the learners' apparently non-native usage. Some of the most overused bundles seem to have been generalized by the learners to fit into contexts where native speakers rarely use them; these can be described as 'phraseological teddy bears'. Pedagogical applications of the results should start from the underused items in order to broaden the phraseological repertoire of the learners. 1 Ellis (2012: 29) uses the term phrasal teddy bear to refer to "highly frequent and prototypically functional phrases like put it on the table, how are you?, it's lunch time", or "formulaic phrases with routine functional purposes" (ibid.: 37). Since lexical bundles, unlike Ellis's formulaic sequences, do not require word strings to be idiomatic or complete functional units, I have opted for the related term phraseological teddy bear.

show abstract

Section: Methodsmentioning

confidence: 99%

Phraseological teddy bears

Hasselgård¹

2019

Corpus Linguistics, Context and Culture

View full text Add to dashboard Cite

show abstract

“…These steps resulted in marked-up .xml files that are then ready for inclusion into VESPA. More specifically, following the procedure used in the BAWE corpus (Ebeling and Heuboeck 2007;Heuboeck et al 2008), the texts were first processed using Word macros to annotate main sections (e.g. abstract, introduction), block quotes and so-called mentioned items (e.g.…”

Section: Vespa: Corpus Compilation Corpus Processing and Accessmentioning

confidence: 99%

The Varieties of English for Specific Purposes dAtabase (VESPA): Towards a multi-L1 and multi-register learner corpus of disciplinary writing

Paquot

Larsson

Hasselgård

et al. 2022

RiCL

Self Cite

View full text Add to dashboard Cite

The Varieties of English for Specific Purposes dAtabase (VESPA first release) is the result of an international corpus compilation project that aims to address the lack of large-scale, open access, multi-L1, multi-discipline and multi-register learner corpora. This corpus report provides a detailed description of VESPA and illustrates possible uses of the corpus for register exploration of learner data. Specifically, it first offers an overview of the makeup of the corpus and the online interface that can be used to search and download the corpus. It then gives an illustrative example of a study where multi-dimensional analysis was used to investigate the relative importance of register vis-à-vis other factors in learner academic writing. In the concluding remarks, we identify priorities for future developments in the VESPA project, including the addition of more L1 components, more disciplines and more registers, as well as the compilation of a comparable corpus of native student writing.

show abstract

“…We plan to make MICUSP available in both full XML and plain text (i.e., with no annotation or metadata). While for each of the original file formats (e.g., Word, PDF) it might have been possible to make use of the formatting codes and styles in the original document (as demonstrated in Ebeling and Heuboeck, 2007), we decided to transform all the original files into plain text Unicode at the start of the conversion process in order to achieve consistency in the subsequent steps.…”

Section: Corpus Conversion Markup and Annotationmentioning

confidence: 99%

From student hard drive to web corpus (part 2): the annotation and online distribution of the Michigan Corpus of Upper-level Student Papers (MICUSP)

O’Donnell

Römer²

2012

Corpora

View full text Add to dashboard Cite

This paper continues the detailed account of the central steps involved in compiling and distributing the Michigan Corpus of Upper-level Student Papers (MICUSP). In this paper, we discuss the annotation process used to encode MICUSP files in TEI-compliant XML, and the development of MICUSP Simple, the online application through which the corpus is now freely available online. We also describe how MICUSP Simple can be used to carry out simple word/phrase searches and to browse papers within different categories.

show abstract

Encoding document information in a corpus of student writing: the British Academic Written English corpus

Cited by 16 publications

References 5 publications

Phraseological teddy bears

Phraseological teddy bears

The Varieties of English for Specific Purposes dAtabase (VESPA): Towards a multi-L1 and multi-register learner corpus of disciplinary writing

From student hard drive to web corpus (part 2): the annotation and online distribution of the Michigan Corpus of Upper-level Student Papers (MICUSP)

Contact Info

Product

Resources

About