Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Sparse data models, where data is assumed to be well represented as a linear combination of a few elements from a dictionary, have gained considerable attention in recent years, and their use has led to state-of-the-art results in many signal and image processing tasks. It is now well understood that the choice of the sparsity regularization term is critical in the success of such models. In this work, we use tools from information theory, and in particular universal coding theory, to propose a framework for designing sparsity regularization terms which have several theoretical and practical advantages when compared to the more standard 0 or 1 ones, and which lead to improved coding performance and accuracy in reconstruction and classification tasks. We also report on further improvements obtained by imposing low mutual coherence and Gram matrix norm on the corresponding learned dictionaries.The presentation of the framework and theoretical foundations is complemented with examples in image denoising and classification.