Although there are several speech synthesis models available for different languages tailored to specific domain requirements and applications, there is currently no readily available information on the latest trends in Arabic language speech synthesis. This can make it challenging for beginners to research and develop text-to-speech (TTS) systems for Arabic languages. To address this issue, this article provides a comprehensive overview of several scholars' contributions to the field of Arabic TTS, along with an examination of the unique features of the Arabic language and the corresponding challenges in creating TTS systems. Reporting only on papers discussing Arabic TTS, this systematic review evaluated the available literature published between 2000 and 2022. We conducted a systematic review in six databases using preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines to identify studies that addressed Arabic Text-to-Speech systems. Of a total of 3719 articles identified, only 36 (0.96%) articles met our search criteria. Bibliometric analyses of these studies were conducted and reported. The results highlight the main types of speech synthesis techniques used in TTS systems: concatenative, formant, deep neural network (DNN), hybrid models, and multiagent. The corpora used to develop these systems, as well as the diacritization techniques incorporated, evaluation techniques, and the results of the performance of the systems are reported. Subjective evaluation using the mean opinion score is most applied to measure the accuracy of systems. This study also identifies gaps in the literature and makes recommendations for future research directions.