Arabic corpora have existed since the last decade of the past century. Although they are constantly increasing, more advanced tools and morpho-syntactically annotated Arabic corpora are still needed for research and teaching. Likewise, parallel and specialised corpora are rare despite the growing need to use them in empirical linguistic investigations of authentic Arabic texts and for language and translation teaching. Therefore, building legal corpora will pave the way for more research in Arabic legal translation, an area which is under-researched worldwide. This paper aims to discuss the building of a collection of specialised parallel and monolingual legal corpora. In particular, it will discuss the building of diachronic corpora, which include all available constitutions of 22 Arabic countries. The aim of building all available versions of these constitutions is two-fold: (1) interdisciplinary corpus-based and socio-cultural investigations and (2) research-led and blended-learning pedagogical approaches to translation teaching and learning. Thus, these corpora are of great value to translation trainers and researchers, law academics and professionals, and governmental, non-governmental and international organisations. The paper will demonstrate the process of building these specialised complex corpora and the challenges encountered throughout this process. Among the challenges faced during the data collection and processing phases are (1) limitations of finding the original constitutions for each Arabic country since some of them date back to 1922; (2) file conversion and the difficulty of choosing one Optical Character Recognition (OCR) tool to rely on for the Arabic language since many lack accuracy, efficiency as well as encoding issues in Arabic.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.