A theoretical framework is proposed for simultaneous reconstruction of the three-dimensional grain shapes, intragranular strain and orientation fields inside polycrystals from near-field X-ray diffraction images, using box beam illumination. The approach, named Iterative Tensor Field (ITF) reconstruction, uses a tensor field representation and a kinematical forward simulation model to reproduce the measured diffraction signal from individual grains. The framework establishes a link between the local deformation components inside the grains and the intensities of the diffraction signal in the measured images by forming a local linear problem. This is solved using a large scale linear optimisation method in every main iteration of the underlying non-linear problem. The optimisation enforces smooth gradients and the objective function may include regularisation constraints of static equilibrium or input from a Crystal Plasticity FEM simulation. The method has modest computational requirements and enables efficient scanning of millimetre or sub-millimetre sized specimens. Results on experimental data measured on a Gum metal specimen are presented, which demonstrate convergence and the feasibility of the approach. The mathematical formulation, data representation and challenges in the reconstruction and validation are discussed. The physical aspects of the contrast phenomenon, the deformation sensitivity of the technique, and potential means of error assessment are described. A number of alternative concepts for a polycrystalline deformation model and potential solvers are also presented.