Urdu News Dataset 1M

Hussain, Khalid; Mughal, Nimra; Ali, Irfan; Saif, Hassan; Daudpota, Sher Muhammad

doi:10.17632/834vsxnb99.3

Cited by 1 publication

(1 citation statement)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The suggested model and its variants are trained and tested using the publically available dataset ( Hussain et al, 2021 ). Quantitative measurements are also used to assess the models.…”

Section: Resultsmentioning

confidence: 99%

Abstractive text summarization of low-resourced languages using deep learning

Shafiq

Hamid

Asif

et al. 2023

PeerJ Computer Science

View full text Add to dashboard Cite

Background Humans must be able to cope with the huge amounts of information produced by the information technology revolution. As a result, automatic text summarization is being employed in a range of industries to assist individuals in identifying the most important information. For text summarization, two approaches are mainly considered: text summarization by the extractive and abstractive methods. The extractive summarisation approach selects chunks of sentences like source documents, while the abstractive approach can generate a summary based on mined keywords. For low-resourced languages, e.g., Urdu, extractive summarization uses various models and algorithms. However, the study of abstractive summarization in Urdu is still a challenging task. Because there are so many literary works in Urdu, producing abstractive summaries demands extensive research. Methodology This article proposed a deep learning model for the Urdu language by using the Urdu 1 Million news dataset and compared its performance with the two widely used methods based on machine learning, such as support vector machine (SVM) and logistic regression (LR). The results show that the suggested deep learning model performs better than the other two approaches. The summaries produced by extractive summaries are processed using the encoder-decoder paradigm to create an abstractive summary. Results With the help of Urdu language specialists, the system-generated summaries were validated, showing the proposed model’s improvement and accuracy.

show abstract

“…The suggested model and its variants are trained and tested using the publically available dataset ( Hussain et al, 2021 ). Quantitative measurements are also used to assess the models.…”

Section: Resultsmentioning

confidence: 99%