We introduce _Arabic-Nougat_, a suite of OCR models designed to convert Arabic book pages into structured Markdown text. Building on Meta’s _Nougat_ architecture, _Arabic-Nouga_t includes three specialized models: _arabic-small-nougat, arabic-base-nougat, and arabic-large-nougat_. These models are fine-tuned using a synthetic dataset, _arabic-img2md_, consisting of 13.7k paired samples of Arabic book pages and their Markdown representations. Key innovations include the _Aranizer-PBE-86k_ tokenizer, which optimizes tokenization efficiency, and the use of torch.bfloat16 precision and Flash Attention 2 for efficient training and inference. Our models significantly outperform existing methods, with _arabic-large-nougat_ achieving the highest Markdown Structure Accuracy and the lowest Character Error Rate. We also release a large-scale dataset of 1.1 billion Arabic tokens extracted from over 8,500 books using our SOTA model, providing a valuable resource for further Arabic OCR research. All models and datasets are open-sourced, and our implementation is available at https://github.com/MohamedAliRashad/arabic-nougat.