2016
DOI: 10.1007/s10579-015-9331-6
|View full text |Cite
|
Sign up to set email alerts
|

Crowdsourcing for web genre annotation

Abstract: Recently, genre collection and automatic genre identification for the web has attracted much attention. However, currently there is no genre-annotated corpus of web pages where inter-annotator reliability has been established, i.e. the corpora are either not tested for inter-annotator reliability or exhibit low inter-coder agreement. Annotation has also mostly been carried out by a small number of experts, leading to concerns with regard to scalability of these annotation efforts and transferability of the sch… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 48 publications
0
9
0
Order By: Relevance
“…Furthermore, this relationship imposes challenges for the development of robust systems for register identification (Petrenz & Webber, 2011;Sharoff et al, 2010). Although most of the Web corpora that are typically utilized tend to be relatively small-both in size and in coverage of Web registers, only representing selected registers found on the Internet (Asheghi et al, 2016;Santini, 2011;Meyer zu Eissen & Stein, 2004;Vidulin et al, 2009), there are certain collections that provide a large inventory of categories, such as the KRYS 1 corpus consisting of 70 genres (Berninger et al, 2008). Given this, the performance of the systems used to automatically identify registers tend to convey that the Web registers are relatively well discriminated.…”
Section: Practical Solutionsmentioning
confidence: 99%
“…Furthermore, this relationship imposes challenges for the development of robust systems for register identification (Petrenz & Webber, 2011;Sharoff et al, 2010). Although most of the Web corpora that are typically utilized tend to be relatively small-both in size and in coverage of Web registers, only representing selected registers found on the Internet (Asheghi et al, 2016;Santini, 2011;Meyer zu Eissen & Stein, 2004;Vidulin et al, 2009), there are certain collections that provide a large inventory of categories, such as the KRYS 1 corpus consisting of 70 genres (Berninger et al, 2008). Given this, the performance of the systems used to automatically identify registers tend to convey that the Web registers are relatively well discriminated.…”
Section: Practical Solutionsmentioning
confidence: 99%
“…To enable the development of more stable systems, Asheghi et al (2016) presented the Leeds Web Genre Corpus, which consists of 15 genres and 3964 documents. The Leeds Corpus was collected by first defining a set of registers (or genres) exclusively used on the Internet and then manually selecting the documents to represent these categories.…”
Section: Detecting Online Registersmentioning
confidence: 99%
“…Similarly, also Petrenz and Webber (2011) pointed out that an ideal automatic genre classification system "should be stable in the face of changes in topic distribution". However, Asheghi et al (2016) noted inevitable correlations between topics and registers such as recipes. Thus, analyzing stability by focusing on topical information can be restrictive.…”
Section: Towards Stable Identification Of Registersmentioning
confidence: 99%
“…To evaluate and improve the reproducibility of the AI2D-RST framework, future work should employ naive annotators, who are assigned tasks that do not build on concepts introduced in the annotation framework (see e.g. Asheghi et al 2016). This kind of non-theoretical grounding (Riezler 2014) could help to break circularity by evaluating, for instance, whether naive annotators perceive diagram elements to form visual groups (grouping) or whether arrows and lines are considered to signal connections between individual diagram elements or visual groups (connectivity).…”
Section: On the Reliability And Reproducibility Of The Ai2d-rst Annotmentioning
confidence: 99%