2023
DOI: 10.48550/arxiv.2302.14494
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Text classification dataset and analysis for Uzbek language

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…English-based news datasets such as 20 Newsgroups 1 , Reuters-21578 2 , and RCV1 3 comprise thousands of articles collected from several news websites, news magazines, and newsletters (newspapers) to create the different news corpora. Similarly, news datasets in Arabic (ALJ-news dataset [41]), Uzbek [42], Urdu (Urdu-news [43]), and South African (Setswana and Sepedi-news [28]) languages were also collected from a cross-section of news portals. For our study, the Ewe news dataset was collected from popular news portals, which include Ghana News 4 , Voice of Africa 5 , Togo First 6 , Punch News 7 , BBC-Africa 8 , My Joy News 9 , and Citi News 10 .…”
Section: Data Collectionmentioning
confidence: 99%
See 1 more Smart Citation
“…English-based news datasets such as 20 Newsgroups 1 , Reuters-21578 2 , and RCV1 3 comprise thousands of articles collected from several news websites, news magazines, and newsletters (newspapers) to create the different news corpora. Similarly, news datasets in Arabic (ALJ-news dataset [41]), Uzbek [42], Urdu (Urdu-news [43]), and South African (Setswana and Sepedi-news [28]) languages were also collected from a cross-section of news portals. For our study, the Ewe news dataset was collected from popular news portals, which include Ghana News 4 , Voice of Africa 5 , Togo First 6 , Punch News 7 , BBC-Africa 8 , My Joy News 9 , and Citi News 10 .…”
Section: Data Collectionmentioning
confidence: 99%
“…The portals were selected to represent various categories, such as politics, coronavirus, sports, business, entertainment, and local news. The news articles were automatically extracted using the open-source Python library Beautiful Soup 11 , as in [29,42]. Eight native speakers of the Ewe language are invited to label the dataset simultaneously.…”
Section: Data Collectionmentioning
confidence: 99%