We propose a hierarchical approach for Bayesian modeling and segmentation of continuous sequences of bimanual object manipulations. Based on bimodal (audio and tactile) low-level time series, the presented approach identifies semantically differing subsequences. It consists of two hierarchically executed stages, each of which employs a Bayesian method for unsupervised change point detection (Fearnhead, 2005). In the first step we propose to use a mixture of model pairs for bimanual tactile data. To this end, we select "object interaction" and "no object interaction" regions for the left and the right hand synchronously. In the second step we apply a set of Autoregressive (AR) models to the audio data. This allows us to select regions within "object interaction" segments according to qualitative changes in the audio signal. Two simple model types that allow the calculation of modality-specific segment likelihoods serve as a foundation for this modeling approach. Based on the acquired ground truth, empirical evaluation has showed that the generated segments correctly capture the semantic structure of the test time series. The developed procedure can serve as a building block for higher-level action and activity modeling frameworks.