The goal of this research is to find a way of highlighting the acoustic differences between consonant phonemes of the Polish and Lithuanian languages. For this purpose, similarity matrices are employed based on speech acoustic parameters combined with a convolutional neural network (CNN). In the first experiment, we compare the effectiveness of the similarity matrices applied to discerning acoustic differences between consonant phonemes of the Polish and Lithuanian languages. The similarity matrices built on both an extensive set of parameters and a reduced set after removing high-correlated parameters are used. The results show that higher accuracy is obtained by the similarity matrices without discarding high-correlated parameters. In the second experiment, the averaged accuracies of the similarity matrices obtained are compared with the results provided by spectrograms combined with CNN, as well as the results of the vectors containing acoustic parameters and two baseline classifiers, namely k-nearest neighbors and support vector machine. The performance of the similarity matrix approach demonstrates its superiority over the methods used for comparison.