2022
DOI: 10.3390/app12136648
|View full text |Cite
|
Sign up to set email alerts
|

The Saudi Novel Corpus: Design and Compilation

Abstract: Arabic has recently received significant attention from corpus compilers. This situation has led to the creation of many Arabic corpora that cover various genres, most notably the newswire genre. Yet, Arabic novels, and specifically those authored by Saudi writers, lack the sufficient digital datasets that would enhance corpus linguistic and stylistic studies of these works. Thus, Arabic lags behind English and other European languages in this context. In this paper, we present the Saudi Novels Corpus, built t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 41 publications
0
1
0
Order By: Relevance
“…The written texts encompassed a variety of genres, including literature, folklore, newspapers, and religious texts, to ensure a diverse representation of language use in both Riau-Malay and Sundanese (cf. Alfraidi et al, 2022;Sneddon, 1996).…”
Section: Data Collectionmentioning
confidence: 99%
“…The written texts encompassed a variety of genres, including literature, folklore, newspapers, and religious texts, to ensure a diverse representation of language use in both Riau-Malay and Sundanese (cf. Alfraidi et al, 2022;Sneddon, 1996).…”
Section: Data Collectionmentioning
confidence: 99%
“…Several corpora for genre classification have been developed over the years in multiple languages, such as English, Arabic, Spanish, and more [20][21][22][23]. Not much analogous research has been conducted on datasets in the Russian language.…”
Section: Related Workmentioning
confidence: 99%
“…One added value distinguished in Abu Elkhair's corpus was marking up its data by adding metadata fields using SGML and XML. Alfraidi and his colleagues [21] recently introduced the Saudi Novels Corpus, a useful linguistic and stylistic research tool that contains around 3,000,000 tagged words gathered from 53 novels written by different writers and covers the period from 1930 to 2019. They outlined the steps they took and the choices they made when building the corpus.…”
Section: Literature Reviewmentioning
confidence: 99%