Among the numerous techniques followed to learn a linear classifier through the discriminative dictionary and sparse representations learning of signals, the techniques to learn a nonparametric Bayesian classifier jointly and discriminately with the dictionary and the corresponding sparse representations have drawn considerable attention from researchers. These techniques jointly learn two sets of sparse representations, one for the training samples over the dictionary and the other for the corresponding labels over the dictionary classifier. At the prediction stage, the representations of the test samples computed over the learned dictionary do not truly represent the corresponding labels, exposing weakness in the joint learning claim of these techniques. We mitigate this problem and strengthen the joint by learning a set of weights over the dictionary to represent the training data and further optimizing the same weights over the dictionary classifier to represent the labels of the corresponding classes of the training data. Now, at the prediction stage, the representation weights of the test samples computed over the learned dictionary also represent the labels of the corresponding classes of the test samples, resulting in the accurate reconstruction of the labels of the classes by the learned dictionary classifier. Overall, a reduction in the size of the Bayesian model’s parameters also improves training time. We analytically and nonparametrically derived the posterior conditional probabilities of the model from the overall joint probability of the model using Bayes’ theorem. We used the Gibbs sampler to solve the joint probability of the model using the derived conditional probabilities, which also supports our claim of efficient optimization of the coupled/joint dictionaries and the sparse representation parameters. We demonstrated the effectiveness of our approach through experiments on the standard datasets, i.e., the Extended YaleB and AR face databases for face recognition, Caltech-101 and Fifteen Scene Category databases for categorization, and UCF sports action database for action recognition. We compared the results with the state-of-the-art methods in the area. The classification accuracies, i.e., 93.25%, 89.27%, 94.81%, 98.10%, and 95.00%, of our approach on the datasets have increases of 0.5 to 2% on average. The overall average error margin of the confidence intervals in our approach is 0.24 compared with the second-best approach, JBDC, for which it is 0.34. The AUC–ROC scores of our approach are 0.98 and 0.992, which are better than those of others, i.e., 0.960 and 0.98, respectively. Our approach is also computationally efficient.