2007
DOI: 10.3366/cor.2007.2.2.241
|View full text |Cite
|
Sign up to set email alerts
|

Encoding document information in a corpus of student writing: the British Academic Written English corpus

Abstract: The information contained in a document is only partly represented by the wording of the text; in addition, features of formatting and layout can be combined to lend specific functionality to chunks of text (e.g., section headings, highlighting, enumeration through list formatting, etc.). Such functional features, although based on the ‘objective’ typographical surface of the document, are often inconsistently realised and encoded only implicitly, i.e., they depend on deciphering by a competent reader. They ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 16 publications
(10 citation statements)
references
References 5 publications
0
10
0
Order By: Relevance
“…The word counts given in Table 1 are exclusive of the ignored material. See Ebeling and Heuboeck (2007) and the respective corpus manuals (Paquot et al 2010, Heuboeck, Holmes, andNesi 2008) for more information regarding the annotation.…”
Section: Methodsmentioning
confidence: 99%
“…The word counts given in Table 1 are exclusive of the ignored material. See Ebeling and Heuboeck (2007) and the respective corpus manuals (Paquot et al 2010, Heuboeck, Holmes, andNesi 2008) for more information regarding the annotation.…”
Section: Methodsmentioning
confidence: 99%
“…These steps resulted in marked-up .xml files that are then ready for inclusion into VESPA. More specifically, following the procedure used in the BAWE corpus (Ebeling and Heuboeck 2007;Heuboeck et al 2008), the texts were first processed using Word macros to annotate main sections (e.g. abstract, introduction), block quotes and so-called mentioned items (e.g.…”
Section: Vespa: Corpus Compilation Corpus Processing and Accessmentioning
confidence: 99%
“…We plan to make MICUSP available in both full XML and plain text (i.e., with no annotation or metadata). While for each of the original file formats (e.g., Word, PDF) it might have been possible to make use of the formatting codes and styles in the original document (as demonstrated in Ebeling and Heuboeck, 2007), we decided to transform all the original files into plain text Unicode at the start of the conversion process in order to achieve consistency in the subsequent steps.…”
Section: Corpus Conversion Markup and Annotationmentioning
confidence: 99%