Deep learning algorithms, especially Convolution Neural Networks (CNN), have been rapidly developed due to their flexibility and scalability to be adopted in several fields for modeling real-world applications like object detection, image classification, etc. However, their high accuracy incurs intensive computations. Therefore, it is crucial to carefully choose a suitable computer platform and implementation methodology for CNN network architectures while achieving increased efficiency. Parallel architectures are prevalent in CNN implementation. Herein, we present a new Single Instruction Multi Data (SIMD) parallel implementation of the proposed CNN to speed up the execution process and make it suitable to deploy on low-cost, low-power consumption platforms. The proposed implementation produces an improved model of deep CNN executable on a cost-efficient platform and portability to work autonomously with multi-core processing units while maintaining working accuracy. Raspberry Pi 3 B is a low-power target device for implementing our model. The proposed approach is characterized by high diagnostic accuracy of up to 96.35 % while incurring power consumption of 3.65 Watts, achieving power reduction between 19.17 % and 68.45 % compared to the prior work. Meanwhile, it has a fine inference time for the selected platform. The outstanding results of this study reflect the success of employing parallel architectures to utilize the quad courses of the ARM processor on the target platform. The presented model can be an efficient medical assistant to provide automated detection and diagnosis for myopia ocular disease. Thus, it can be a promising healthcare toolkit that reduces the effort of the medical staff and increases the quality of the provided medical services for myopia patients