Recognizing text from the nature scene images and videos has been the challenging task of computer vision and machine learning research community in recent years. These texts are difficult to recognize because of their shapes, complex backgrounds, color, shape and size variations. However, text recognition is very much useful in indexing, keyword-based image and video search, and information retrieval. In this research paper, a model is proposed to detect the isolated text characters in the photographic images of natural scenes. The proposed model uses the combination of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) for recognizing the text in natural images. The model uses two networks, where the first network combines the low-level and middle-level features to increase the feature size and passes the enriched information to the second network. Here, features are again widened by combining with high-level features, resulting in powerful and robust features. To evaluate the proposed model, ICDAR2003 (IC03), ICDAR2013 (IC13), SVT (Street View Text) datasets have been used. And an extensive Tamil news tickers image dataset has been developed to evaluate the model. The experimental results show that the combined feature fusion technique outperforms the other methods on the ICDAR2003, ICDAR2013, SVT and Tamil news tickers datasets.