Recent neuroscience studies in awake and behaving animals demonstrate that a deeper understanding of brain function requires a deeper understanding of behavior. Detailed behavioral measurements are now often collected using video cameras, resulting in an increased need for computer vision algorithms that extract useful information from this video data. In this work we introduce a new semi-supervised framework that combines the output of supervised pose estimation algorithms (e.g. DeepLabCut) with unsupervised dimensionality reduction methods to produce interpretable, low-dimensional representations of behavioral videos that extract more information than pose estimates alone. We demonstrate this method, the Partitioned Subspace Variational Autoencoder (PS-VAE), on head-fixed mouse behavioral videos. In a close up video of a mouse face, where we track pupil location and size, our method extracts unsupervised outputs that correspond to the eyelid and whisker pad positions, with no additional user annotations required. We use this resulting interpretable behavioral representation to construct saccade and whisking detectors, and quantify the accuracy with which these signals can be decoded from neural activity in visual cortex. In a two-camera mouse video we show how our method separates movements of experimental equipment from animal behavior, and extracts unsupervised features like chest position, again with no additional user annotation needed. This allows us to construct paw and body movement detectors, and decode individual features of behavior from widefield calcium imaging data. Our results demonstrate how the interpretable partitioning of behavioral videos provided by the PS-VAE can facilitate downstream behavioral and neural analyses.