Digital watermarking has the properties such as invisibility and anti-aggression, so the digital watermarking technology has been widely used in copyright protection, information hiding. The watermarking technology takes into account the invisibility and robustness of the watermark by controlling the embedding intensity and position of the watermark mainly in the transformation domain. In this paper, discrete cosine transform (DCT) is adopted to transform the given image from spatial domain to frequency domain for adding watermark information. In order to meet the demands of image watermarking batch processing and cloud processing in the future, this paper optimized the DCT algorithm and the data precision, and successfully deployed the designed accelerator kernel on the FPGA cloud platform to speed up the processing of watermarking. The implementation of data processing based on cloud platform is the development trend of big data era. The cloud platform adopted in this paper is based on the OpenCL heterogeneous architecture combining CPU and FPGA. The cloud-based implementation makes digital watermarking application highly extensible, widely shareable, and more secure. The whole system implements a series of complete cloud processes including image decoding, image preprocessing, watermark embedding, and watermarked image encoding. The watermarking algorithm is accelerated by the efficient parallel computing capabilities of FPGA. It can be seen that the result of acceleration is remarkable, providing the state-of-the-art throughput of 1.676 GBps and the highest processing speed of 937 FPS for 800 × 800 sized colorimage. INDEX TERMS Cloud services, parallel operation, integer DCT, data encryption and watermarking, heterogeneous architecture.