Corpus Linguistics Beyond the Word 2007
DOI: 10.1163/9789401203845_007
|View full text |Cite
|
Sign up to set email alerts
|

Between the Humanist and the Modernist: Semi-automated Analysis of Linguistic Corpora

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 0 publications
0
7
0
Order By: Relevance
“…First, the decision to use only automatically computed measures brings up the issue of software accuracy. This was evaluated here by spot‐checks; however, future investigations should (ideally) follow a semi‐automatic approach with a systematic correction of tagger errors (Garretson & O'Connor, , see also Vyatkina, ). Second, to enhance research comparability, more studies are needed with data manually annotated for syntactic complexity measures used for similar learner populations (such as clauses, T‐units, and lexical sophistication).…”
Section: Implications and Conclusionmentioning
confidence: 99%
“…First, the decision to use only automatically computed measures brings up the issue of software accuracy. This was evaluated here by spot‐checks; however, future investigations should (ideally) follow a semi‐automatic approach with a systematic correction of tagger errors (Garretson & O'Connor, , see also Vyatkina, ). Second, to enhance research comparability, more studies are needed with data manually annotated for syntactic complexity measures used for similar learner populations (such as clauses, T‐units, and lexical sophistication).…”
Section: Implications and Conclusionmentioning
confidence: 99%
“…This study aimed to benefit from available NLP resources and used automatic corpus tools for computing length‐based measures as well as automatically assigning POS tags as proxy measures for surface syntactic structures (Aarts & Granger, ; Lu, ). However, since automatic POS‐taggers never achieve 100% accuracy even on native speaker data (Schmid, ) and learner errors may affect the accuracy rate even more (Granger, ; van Rooy & Schäfer, ), this study adopted a semi‐automated tagging procedure (Garretson & O'Connor, ).For semi‐automatic POS annotation, the learner corpus was first tagged automatically for 50 distinct word classes using the Tree Tagger for German (Schmid, ), and the output was manually checked (Meunier & de Mönnink, ). For evaluating POS annotations, the total tagger output on the writing of the 2 learners was checked manually by the researcher and independently by another annotator, using the guidelines for the tagset employed in the Tree Tagger (Schiller et al, ).…”
Section: Designmentioning
confidence: 99%
“…The final step is of course the full analysis of the data; what form this takes will vary from study to study, and while it is certainly possible for this to be carried out in part by the computer (see Garretson and O'Connor 2007), that is not an integral part of the approach advocated here.…”
Section: Division Of Labormentioning
confidence: 99%
“…For this reason, in fairly large-scale studies, we prefer to do as much of the data extraction as possible using automated methods but then include at least one pass of "manual" evaluation of the data. If this manual evaluation can be speeded up by the further use of computers (see Garretson and O'Connor 2007), so much the better -but the main point is that the researcher him-or herself will eventually decide which tokens are in fact relevant to the study and how they should best be coded. Semi-automated approaches enable a research team to analyse as much data as possible, as accurately as possible, as quickly as possible.…”
Section: Advantages and Disadvantages Of Semi-automated Approachesmentioning
confidence: 99%
See 1 more Smart Citation