In this paper, we aim to better understand the clothing fashion styles. There remain two challenges for us: 1) how to quantitatively describe the fashion styles of various clothing, 2) how to model the subtle relationship between visual features and fashion styles, especially considering the clothing collocations. Using the words that people usually use to describe clothing fashion styles on shopping websites, we build a Fashion Semantic Space (FSS) based on Kobayashi's aesthetics theory to describe clothing fashion styles quantitatively and universally. Then we propose a novel fashion-oriented multimodal deep learning based model, Bimodal Correlative Deep Autoencoder (BCDA), to capture the internal correlation in clothing collocations. Employing the benchmark dataset we build with 32133 full-body fashion show images, we use BCDA to map the visual features to the FSS. The experiment results indicate that our model outperforms (+13% in terms of MSE) several alternative baselines, confirming that our model can better understand the clothing fashion styles. To further demonstrate the advantages of our model, we conduct some interesting case studies, including fashion trends analyses of brands, clothing collocation recommendation, etc.