Soft robotic grippers are increasingly desired in applications that involve grasping of complex and deformable objects. However, their flexible nature and non-linear dynamics makes the modelling and control difficult. Numerical techniques such as Finite Element Analysis (FEA) present an accurate way of modelling complex deformations. However, FEA approaches are computationally expensive and consequently challenging to employ for real-time control tasks. Existing analytical techniques simplify the modelling by approximating the deformed gripper geometry. Although this approach is less computationally demanding, it is limited in design scope and can lead to larger estimation errors. In this paper, we present a learning based framework that is able to predict contact forces as well as stress distribution from soft Fin Ray Effect (FRE) finger images in real-time. These images are used to learn internal representations for deformations using a deep neural encoder, which are further decoded to contact forces and stress maps using separate branches. The entire network is jointly learned in an end-to-end fashion. In order to address the challenge of having sufficient labelled data for training, we employ FEA to generate simulated images to supervise our framework. This leads to an accurate prediction, faster inference and availability of large and diverse data for better generalisability. Furthermore, our approach is able to predict a detailed stress distribution that can guide grasp planning, which would be particularly useful for delicate objects. Our proposed approach is validated by comparing the predicted contact forces to the computed ground-truth forces from FEA as well as real force sensor. We rigorously evaluate the performance of our approach under variations in contact point, object material, object shape, viewing angle, and level of occlusion.