Present-day communication systems routinely use codes that approach the channel capacity when coupled with a computationally efficient decoder. However, the decoder is typically designed for the Gaussian noise channel, and is known to be sub-optimal for non-Gaussian noise distribution. Deep learning methods offer a new approach for designing decoders that can be trained and tailored for arbitrary channel statistics. We focus on Turbo codes, and propose (DEEPTURBO), a novel deep learning based architecture for Turbo decoding.The standard Turbo decoder (TURBO) iteratively applies the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm with an interleaver in the middle. A neural architecture for Turbo decoding, termed (NEURALBCJR), was proposed recently. There, the key idea is to create a module that imitates the BCJR algorithm using supervised learning, and to use the interleaver architecture along with this module, which is then fine-tuned using end-to-end training. However, knowledge of the BCJR algorithm is required to design such an architecture, which also constrains the resulting learnt decoder. Here we remedy this requirement and propose a fully end-to-end trained neural decoder -Deep Turbo Decoder (DEEPTURBO). With novel learnable decoder structure and training methodology, DEEPTURBO reveals superior performance under both AWGN and non-AWGN settings as compared to the other two decoders -TURBO and NEURALBCJR. Furthermore, among all the three, DEEPTURBO exhibits the lowest error floor.