The recognition performance of optical character recognition (OCR) models can be sub-optimal when document images suffer from various degradations. Supervised deep learning methods for image enhancement can generate high-quality enhanced images. However, these methods demand the availability of corresponding clean images or ground truth text. Sometimes this requirement is difficult to fulfill for real-world noisy documents. For instance, it can be challenging to create paired noisy/clean training datasets or obtain ground truth text for noisy point-of-sale receipts and invoices. Unsupervised methods have been explored in recent years to enhance images in the absence of ground truth images or text. However, these methods focus on enhancing natural scene images. In the case of document images, preserving the readability of text in the enhanced images is of utmost importance for improved OCR performance. In this work, we propose a modified architecture to the CycleGAN model to improve its performance in enhancing document images with better text preservation. Inspired by the success of CNN-BiLSTM combination networks in text recognition models, we propose modifying the discriminator network in the CycleGAN model to a combined CNN-BiLSTM network for better feature extraction from document images during classification by the discriminator network. Results indicate that our proposed model not only leads to better preservation of text and improved OCR performance over the CycleGAN model but also achieves better performance than the classical unsupervised image pre-processing techniques like Sauvola and Otsu.