2021
DOI: 10.1007/s10579-020-09519-z
|View full text |Cite
|
Sign up to set email alerts
|

Exploring the role of lexis and grammar for the stable identification of register in an unrestricted corpus of web documents

Abstract: The Internet offers great possibilities for many scientific disciplines that utilize text data. However, the potential of online data can be limited by the lack of information on the genre or register of the documents, as register—whether a text is, e.g., a news article or a recipe—is arguably the most important predictor of linguistic variation (see Biber in Corpus Linguist Linguist Theory 8:9–37, 2012). Despite having received significant attention in recent years, the modeling of online registers has faced … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 8 publications
(24 citation statements)
references
References 36 publications
0
24
0
Order By: Relevance
“…Previous studies have shown repeatedly that registers vary considerably in terms of how well they are linguistically defined and thus how well they can be automatically identified Egbert, 2018, 2016a;Laippala et al, 2020a). For instance, while texts in the IN (Informational description) and NA (Narrative) classes, such as Encyclopedia articles and Sports reports, have very distinctive characteristics and can be identified with a very high reliability, others, such as Information blogs in the IN class or Advice in the OP (Opinion) class receive much lower scores.…”
Section: English-swedishmentioning
confidence: 99%
See 2 more Smart Citations
“…Previous studies have shown repeatedly that registers vary considerably in terms of how well they are linguistically defined and thus how well they can be automatically identified Egbert, 2018, 2016a;Laippala et al, 2020a). For instance, while texts in the IN (Informational description) and NA (Narrative) classes, such as Encyclopedia articles and Sports reports, have very distinctive characteristics and can be identified with a very high reliability, others, such as Information blogs in the IN class or Advice in the OP (Opinion) class receive much lower scores.…”
Section: English-swedishmentioning
confidence: 99%
“…Biber (2014) showed that registers, such as spoken texts, display functional similarities across languages, which obviously is needed for highquality transfer in register identification. However, analyzing the English CORE registers, Laippala et al (2020a) noted that some registers, such as many blogs, depend highly on lexical characteristics reflecting the discussion topics. These topics, however, may vary extensively between languages.…”
Section: English-swedishmentioning
confidence: 99%
See 1 more Smart Citation
“…Spring (2018) created a concise and comprehensive list of the verb phrase and their meanings, through theoretical knowledge and the concept of corpora. Laippala et al (2021) modeled online registration in the largest available online register corpus, Corpus of Online Registers of English (CORE). Besides, the estimation is implemented on the stability of the model on corpus features, an analysis is conducted on the role of different language features in it, and the differences in individual registers are examined in these two aspects.…”
Section: Discussionmentioning
confidence: 99%
“…Despite the considerable benefits of Automatic Genre Identification (AGI), no established classification exists (Sharoff, 2010). The genre researchers are not consistent in the use of terminology, and they refer to genres, text types, functional text dimensions or registers in different ways (Sharoff, 2018;Egbert et al, 2015;Laippala et al, 2021;Lee, 2002). Furthermore, there is no consensus on the genre definition.…”
Section: Related Workmentioning
confidence: 99%