Unsupervised machine learning models build an internal representation of their training data
without the need for explicit human guidance or feature engineering. This learned representation
provides insights into which features of the data are relevant for the task at hand. In the context of
quantum physics, training models to describe quantum states without human intervention offers a
promising approach to gaining insight into how machines represent complex quantum states. The
ability to interpret the learned representation may offer a new perspective on non-trivial features
of quantum systems and their efficient representation. We train a generative model on two-qubit
density matrices generated by a parameterized quantum circuit. In a series of computational ex-
periments, we investigate the learned representation of the model and its internal understanding of
the data. We observe that the model learns an interpretable representation which relates the quan-
tum states to their underlying entanglement characteristics. In particular, our results demonstrate
that the latent representation of the model is directly correlated with the entanglement measure
concurrence. The insights from this study represent proof of concept towards interpretable ma-
chine learning of quantum states. Our approach offers insight into how machines learn to represent
small-scale quantum systems autonomously.