A model‐based co‐clustering divides the data based on two main axes and simultaneously trains a supervised model for each co‐cluster using all other input features. For example, in the rating prediction task of recommender system, the main two axes are items and users. In each co‐cluster, we train a regression model for predicting the rating based on other features such as user's characteristics (e.g., gender), item's characteristics (e.g., genre), contextual features (e.g., location), and so on. In reality, users and items do not necessarily belong to a single co‐cluster, but rather can be associated with several co‐clusters. We extend the model‐based co‐clustering to support fuzzy co‐clustering. In this setting, each item–user pair is associated to every co‐cluster with some membership grade. This grade indicates the level of relevance of the item–user pair to the co‐cluster. Furthermore, we propose a distributed algorithm, based on a map‐reduce approach, to handle big datasets. Evaluating the fuzzy co‐clustering algorithm on three datasets shows a significant improvement comparing with a regular co‐clustering algorithm. In addition, a map‐reduce version of the fuzzy co‐clustering algorithm significantly reduces the runtime.