Collaborative filtering (CF) approaches, which provide recommendations based on ratings or purchase history, perform well for users and items with sufficient interactions. However, CF approaches suffer from the cold-start problem for users and items with few ratings. Hybrid recommender systems that combine collaborative filtering and content-based approaches have been proved as an effective way to alleviate the cold-start issue. Integrating contents from multiple heterogeneous data sources such as reviews and product images is challenging for two reasons. Firstly, mapping contents in different modalities from the original feature space to a joint lower-dimensional space is difficult since they have intrinsically different characteristics and statistical properties, such as sparse texts and dense images. Secondly, most algorithms only use content features as the prior knowledge to improve the estimation of user and item profiles but the ratings do not directly provide feedback to guide feature extraction. To tackle these challenges, we propose a tightly-coupled deep network model for fusing heterogeneous modalities, to avoid tedious feature extraction in specific domains, and to enable two-way information propagation from both content and rating information. Experiments on large-scale Amazon product data in book and movie domains demonstrate the effectiveness of the proposed model for cold-start recommendation.