To reduce noise within a tag line, unsharpen the tag edges in spatial domain, and amplify the tag-to-background contrast, a 3D energy minimization framework for the enhancement of tagged Cardiac Magnetic Resonance (CMR) image sequences, based on learning first-and second-order visual appearance models, is proposed. The first-order appearance modeling uses adaptive Linear Combinations of Discrete Gaussians (LCDG) to accurately approximate the empirical marginal probability distribution of CMR signals for a given sequence, and separates tag and background submodels. It is also used to classify the tag lines and the background. The second-order model considers image sequences as samples of a translation-and rotation-invariant 3D Markov-Gibbs Random Field (MGRF) with multiple pairwise voxel interactions. A 3D energy function for this model is built by using the analytical estimation of the spatiotemporal geometry and Gibbs potentials of interaction. To improve the strain estimation, by enhancing the tag and background homogeneity and contrast, the given sequence is adjusted using comparisons to the energy minimizer. Special 3D geometric phantoms, motivated by statistical analysis of the tagged CMR data, have been designed to validate the accuracy of our approach. Experiments with the phantoms and eight real data sets have confirmed the high accuracy of the functional parameters that are estimated for the enhanced tagged sequences when using popular spectral techniques, such as spectral Harmonic Phase (HARP).