Digital watermarking, as a technology to protect copyright, integrity, copy prevention or direction tracking of digital products, is currently commonly used to protect confidential documents and files within enterprises. In view of the PDF document format which is widely used in enterprise documents, this paper presents a method of PDF document watermark recognition based on natural language processing technology. By collecting a large number of PDF documents, using the improved N-gram language model based on forward and reverse matching algorithms to segment text content, a KenLM language model based on language model probability and conditional probability calculation rules is established to identify PDF document watermarks, which effectively improves the accuracy of PDF document watermark recognition. The validity of this method is verified by selecting a PDF format document of an enterprise, training the language model and calculating the prediction accuracy.