Taha Zerrouki scite author profile

Taha Zerrouki

5Publications

52Citation Statements Received

9Citation Statements Given

How they've been cited

How they cite others

Affiliations

University of Bouira, École Nationale Supérieure d'Informatique, École Normale Supérieure - PSL

Publications

Order By: Most citations

Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems

Zerrouki

Balla

2017

Data in Brief

View full text Add to dashboard Cite

Arabic diacritics are often missed in Arabic scripts. This feature is a handicap for new learner to read َArabic, text to speech conversion systems, reading and semantic analysis of Arabic texts.The automatic diacritization systems are the best solution to handle this issue. But such automation needs resources as diactritized texts to train and evaluate such systems.In this paper, we describe our corpus of Arabic diacritized texts. This corpus is called Tashkeela. It can be used as a linguistic resource tool for natural language processing such as automatic diacritics systems, dis-ambiguity mechanism, features and data extraction.The corpus is freely available, it contains 75 million of fully vocalized words mainly 97 books from classical and modern Arabic language.The corpus is collected from manually vocalized texts using web crawling process.

show abstract

Autocorrection of arabic common errors for large text corpus

Zerrouki

Alhawiti²,

Balla

2014

View full text Add to dashboard Cite

Automatic correction of misspelled words means offering a single proposal to correct a mistake, for example, switching two letters, omitting letter or a key press. In Arabic, there are some typical common errors based on letter errors, such as confusing in the form of Hamza ‫,ھﻤﺰة‬ confusion between Daad ‫ﺿﺎد‬ and Za ‫,ﻇﺎء‬ and the omission dots with Yeh ‫ﯾﺎء‬ and Teh ‫ﺗﺎء‬. So we propose in this paper a system description of a mechanism for automatic correction of common errors in Arabic based on rules, by using two methods, a list of words and regular expressions.

show abstract

Adapting eSpeak to Arabic language: converting Arabic text to speech language using eSpeak

Zerrouki

Shquier

Balla

et al. 2019

IJRIS

View full text Add to dashboard Cite

Text to speech (TTS) is a crucial tool needed in many domains, mainly for visually impaired users. The availability of TTS open sources improves access to computers and gives more valuable applications. eSpeak provides support for several languages. It is a tool that provides rules and phoneme files for more than 50 languages, besides, eSpeak is a light, fast, low memory consumption and used in multi-platforms. In this paper, we have explored the possibility to adapt the existing text to speech converters into Arabic language in eSpeak. We attempt to define new text to speech conversion rules, adapting existed phonemes and adding missing phonemes for Arabic under eSpeak. The contributions are quite significant; however, the software's developers will be able to integrated these enhancements within the new version, so that users who have problems with visual impairments or children with special needs will utilise this development of eSpeak. The availability of such support, open new fields to use Arabic in TTS environment, especially for blind persons.

show abstract

Arabic Speech Recognition Using Deep Learning and Common Voice Dataset

Oukas

Zerrouki

Haboussi

et al. 2022

View full text Add to dashboard Cite

A New Enhanced Arabic Light Stemmer for IR in Medical Documents

Al-Khatib¹,

Zerrouki²,

Shquier³

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Taha Zerrouki

Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems

Autocorrection of arabic common errors for large text corpus

Adapting eSpeak to Arabic language: converting Arabic text to speech language using eSpeak

Arabic Speech Recognition Using Deep Learning and Common Voice Dataset

A New Enhanced Arabic Light Stemmer for IR in Medical Documents

Contact Info

Product

Resources

About