The Saudi Novel Corpus: Design and Compilation

Alfraidi, Tareq; Abdeen, M.; Yatimi, Ahmed; Alluhaibi, Reyadh; Al-Thubaity, Abdulmohsen

doi:10.3390/app12136648

Cited by 6 publications

(5 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The written texts encompassed a variety of genres, including literature, folklore, newspapers, and religious texts, to ensure a diverse representation of language use in both Riau-Malay and Sundanese (cf. Alfraidi et al, 2022;Sneddon, 1996).…”

Section: Data Collectionmentioning

confidence: 99%

A comparative typology of verbal affixes in Riau-Malay and Sundanese

Tambusai,

Nasution

2024

Indonesian J. Appl. Linguist.

View full text Add to dashboard Cite

This study presents a comprehensive comparative analysis of verbal affixes in two Austronesian languages, Riau-Malay and Sundanese, with a focus on their morphological, syntactic, and semantic properties. Both languages are spoken in the Indonesian archipelago, and while they share certain linguistic characteristics, they exhibit intriguing differences in their verbal affixation systems. This study aims to contribute to the understanding of linguistic diversity within Austronesian languages and to reveal on the mechanisms underlying verb formation in these two distinct linguistic systems. By comparing and contrasting the verbal affixation systems in these languages, this study aims to reveal striking differences in terms of affix types, attachment patterns, and grammatical functions. The analysis explored the morphology of verbal affixes used in Riau-Malay and its comparison in Sundanese. It identifies clear distinctions in the ways affixes are employed to mark tense, aspect, mood, and other grammatical categories. Furthermore, the study also investigated the syntactic roles of verbal affixes, exploring how they affect word order and argument structure in sentences. This analysis exposes intriguing patterns of valency-changing operations in Riau-Malay and Sundanese. The findings of this study are expected to enhance people’s understanding of Riau-Malay and Sundanese and to contribute to the broader typological and theoretical discussions in linguistics. The comparative analysis of these two languages provides valuable insights into the ways in which languages within the Austronesian family can diverge and adapt to their unique cultural and historical contexts. Ultimately, this study may be valuable to advance our knowledge of linguistic diversity and variation, offering a deeper appreciation of the intricate web of languages that shape human communication in the Indonesian archipelago and beyond.

show abstract

Section: Data Collectionmentioning

confidence: 99%

A comparative typology of verbal affixes in Riau-Malay and Sundanese

Tambusai,

Nasution

2024

Indonesian J. Appl. Linguist.

View full text Add to dashboard Cite

show abstract

“…Several corpora for genre classification have been developed over the years in multiple languages, such as English, Arabic, Spanish, and more [20][21][22][23]. Not much analogous research has been conducted on datasets in the Russian language.…”

Section: Related Workmentioning

confidence: 99%

Genre Classification of Books in Russian with Stylometric Features: A Case Study

Vanetik,

Tiamanova,

Kogan

et al. 2024

Information

View full text Add to dashboard Cite

Within the literary domain, genres function as fundamental organizing concepts that provide readers, publishers, and academics with a unified framework. Genres are discrete categories that are distinguished by common stylistic, thematic, and structural components. They facilitate the categorization process and improve our understanding of a wide range of literary expressions. In this paper, we introduce a new dataset for genre classification of Russian books, covering 11 literary genres. We also perform dataset evaluation for the tasks of binary and multi-class genre identification. Through extensive experimentation and analysis, we explore the effectiveness of different text representations, including stylometric features, in genre classification. Our findings clarify the challenges present in classifying Russian literature by genre, revealing insights into the performance of different models across various genres. Furthermore, we address several research questions regarding the difficulty of multi-class classification compared to binary classification, and the impact of stylometric features on classification accuracy.

show abstract

“…One added value distinguished in Abu Elkhair's corpus was marking up its data by adding metadata fields using SGML and XML. Alfraidi and his colleagues [21] recently introduced the Saudi Novels Corpus, a useful linguistic and stylistic research tool that contains around 3,000,000 tagged words gathered from 53 novels written by different writers and covers the period from 1930 to 2019. They outlined the steps they took and the choices they made when building the corpus.…”

Section: Literature Reviewmentioning

confidence: 99%

ARABIC CORPUS of LIBRARY and INFORMATION SCIENCE: DESIGN and CONSTRUCTION

Eddakrouri

2023

The Egyptian Journal of Language Engineering

View full text Add to dashboard Cite

This paper addresses the principal considerations in creating the Arabic Corpus of Library and Information Science, a specialized Arabic corpus on the academic genre. This discusses ten phases of creation: the rationale of the Arabic Corpus of Library and Information Science, types of texts, resources of texts, legal approval, data collection, refining texts, revising texts, saving texts, coding texts, and finally, the size of the Arabic Corpus of Library and Information Science (357,485 tokens). Collecting texts of the articles was the longest and most challenging phase of building the corpus. Especially when we encounter files in PDFs or images that are difficult to read 100% correctly by various software. This challenge has been overcome by considering several factors that have been clarified at this stage. The Arabic Corpus of Library and Information Science can play a significant role in addressing the salient features of the academic genre, including keywords identification, lexico-grammatical patterns, themes, topics, and index terms used in the genre of Library and Information Science. Furthermore, the steps of creating the Arabic Corpus of Library and Information Science can guide in building other corpora for any genre or language.

show abstract

The Saudi Novel Corpus: Design and Compilation

Cited by 6 publications

References 41 publications

A comparative typology of verbal affixes in Riau-Malay and Sundanese

A comparative typology of verbal affixes in Riau-Malay and Sundanese

Genre Classification of Books in Russian with Stylometric Features: A Case Study

ARABIC CORPUS of LIBRARY and INFORMATION SCIENCE: DESIGN and CONSTRUCTION

Contact Info

Product

Resources

About