Dictionary learning, aiming at representing a signal in terms of the atoms of a dictionary, has gained popularity in a wide range of applications, including, but not limited to, image denoising, face recognition, remote sensing, medical imaging and feature extraction. Dictionary learning can be seen as a possible data-driven alternative to solve inverse problems by identifying the data with possible outputs that are either generated numerically using a forward model or the results of earlier observations of controlled experiments. Sparse dictionary learning is particularly interesting when the underlying signal is known to be representable in terms of a few vectors in a given basis. In this paper, we propose to use hierarchical Bayesian models for sparse dictionary learning that can capture features of the underlying signals, e.g., sparse representation and nonnegativity. The same framework can be employed to reduce the dimensionality of an annotated dictionary through feature extraction, thus reducing the computational complexity of the learning task. Computed examples where our algorithms are applied to hyperspectral imaging and classification of electrocardiogram (ECG) data are also presented.