Purpose – The overwhelming speed and scale of digital media production greatly outpace conventional indexing methods by humans. The management of Big Data for e-library speech resources requires an automated metadata solution. The paper aims to discuss these issues. Design/methodology/approach – A conceptual model called semantic ontologies for multimedia indexing (SOMI) allows for assembly of the speech objects, encapsulation of semantic associations between phonic units and the definition of indexing techniques designed to invoke and maximize the semantic ontologies for indexing. A literature review and architectural overview are followed by evaluation techniques and a conclusion. Findings – This approach is only possible because of recent innovations in automated speech recognition. The introduction of semantic keyword spotting allows for indexing models that disambiguate and prioritize meaning using probability algorithms within a word confusion network. By the use of AI error-training procedures, optimization is sought for each index item. Research limitations/implications – Validation and implementation of this approach within the field of digital libraries still remain under development, but rapid developments in technology and research show rich conceptual promise for automated speech indexing. Practical implications – The SOMI process has been preliminarily tested, showing that hybrid semantic-ontological approaches produce better accuracy than semantic automation alone. Social implications – Even as testing proceeds on recorded conference talks at the University of Tebessa (Algeria), other digital archives can look toward similar indexing. This will mean greater access to sound file metadata. Originality/value – Huge masses of spoken data, unmanageable for a human indexer, can prospectively find semantically sorted and prioritized indexing – not transcription, but generated metadata – automatically, quickly and accurately.
Nowadays, with the developments witnessed by the Internet, algorithms have come to control all aspects of digital content. Due to its Arabic roots, it is ironic to find that Arabic Quranic content is still thirsty to benefit from computer linguistics, especially with the advent of artificial intelligence algorithms. The massive spread of Islamic-typed websites and applications has led to a widespread of digital Quranic content. Unfortunately, such content lacks censorship and can rarely match resourcefulness. It is quite difficult, especially for a non-native speaker of the Arabic language, to distinguish and authenticate the provided Quranic verses from the non-Quranic Arabic texts. Text processing techniques classified outside the field of Natural Language Processing (NLP) give less qualified results, especially with Arabic texts. To address this problem, we propose to explore Word Embeddings (WE) with Deep Learning (DL) techniques to identify Quranic verses in Arabic textual content. The proposed work is evaluated using twelve different word embeddings models with two popular classifiers for binary classification, namely: Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). The experimental results showed the superiority of the proposed approach over traditional methods in distinguishing between the Quranic verses and the Arabic text with an accuracy of 98.33%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.