The explosion of online and offline data has changed how we gather, evaluate, and understand data. It is frequently difficult and time-consuming to comprehend large text documents and extract crucial information from them. Text summarization techniques address the mentioned problems by compressing long texts while retaining their essential contents. These techniques rely on the fast delivery of filtered, high-quality content to their users. Due to the massive amounts of data generated by technology and various sources, automated text summarization of large-scale data is challenging. There are three types of automatic text summarization techniques: extractive, abstractive, and hybrid. Regardless of these previous techniques, the generated summaries are a long way from the summarization produced by human experts. Although Arabic is a widely spoken language that is frequently used for content sharing on the web, Arabic text summarization of Arabic content is limited and still immature because of several problems, including the Arabic language's morphological structure, the variety of dialects, and the lack of adequate data sources. This paper reviews text summarization approaches and recent deep learning models for this approach. Additionally, it focuses on existing datasets for these approaches, which are also reviewed, along with their characteristics and limitations. The most often used metrics for summarization quality evaluation are ROUGE1, ROUGE2, ROUGE L, and Bleu. The challenges that are encountered during Arabic text summarizing methods and approaches and the solutions proposed in each approach are analyzed. Many Arabic text summarization methods have problems, such as the lack of golden tokens during testing, being out of vocabulary (OOV) words, repeating summary sentences, lack of standard systematic methodologies and architectures, and the complexity of the Arabic language. Finally, providing the required corpora, improving evaluation using semantic representations, the lack of using rouge metrics in abstractive text summarization, and using recent deep learning models to adopt them in Arabic summarization studies is an essential demand.
Text summarization is essential in natural language processing as the data volume increases quickly. Therefore, the user needs to summarize that data into a meaningful text in a short time. There are two common methods of text summarization: extractive and abstractive. There are many efforts to summarize Latin texts. However, summarizing Arabic texts is challenging for many reasons, including the language’s complexity, structure, and morphology. Also, there is a need for benchmark data sources and a gold standard Arabic evaluation metrics summary. Thus, the contribution of this paper is multi-fold: First, the paper proposes a hybrid approach consisting of a Modified Sequence-To-Sequence (MSTS) model and a transformer-based model. The seq-to-seq-based model is modified by adding multi-layer encoders and a one-layer decoder to its structure. The output of the MSTS model is the extractive summarization. To generate the abstractive summarization, the extractive summarization is manipulated by a transformer-based model. Second, it introduces a new Arabic benchmark dataset, called the HASD, which includes 43k articles with their extractive and abstractive summaries. Third, this work modifies the well-known extractive EASC benchmarks by adding to each text its abstractive summarization. Finally, this paper proposes a new measure called the Arabic-rouge measure for the abstractive summary depending on structure and similarity between words. The proposed method is tested using the proposed HASD and Modified EASC benchmarks and evaluated using Rouge, Bleu, and Arabic Rouge. The experimental results show satisfactory results compared to state-of-the-art methods.
Text summarization is essential in natural language processing as the data volume increases quickly. Therefore, the user needs to summarize that data into a meaningful text in a short time. There are two common methods of text summarization: extractive and abstractive. There are many efforts to summarize Latin texts. However, summarizing Arabic texts is challenging for many reasons, including the language’s complexity, structure, and morphology. Also, there is a need for benchmark data sources and a gold standard Arabic evaluation metrics summary. Thus, the contribution of this paper is multi-fold: First, the paper proposes a hybrid approach consisting of a Modified Sequence-To-Sequence (MSTS) model and a transformer-based model. The seq-to-seq-based model is modified by adding multi-layer encoders and a one-layer decoder to its structure. The output of the MSTS model is the extractive summarization. To generate the abstractive summarization, the extractive summarization is manipulated by a transformer-based model. Second, it introduces a new Arabic benchmark dataset, called the HASD, which includes 43k articles with their extractive and abstractive summaries. Third, this work modifies the well-known extractive EASC benchmarks by adding to each text its abstractive summarization. Finally, this paper proposes a new measure called the Arabic-rouge measure for the abstractive summary depending on structure and similarity between words. The proposed method is tested using the proposed HASD and Modified EASC benchmarks and evaluated using Rouge, Bleu, and Arabic Rouge. The experimental results show satisfactory results compared to state-of-the-art methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.