Extracting automatically the complex set of features composing real high-dimensional data is crucial for achieving high performance in machine-learning tasks. Restricted Boltzmann Machines (RBM) are empirically known to be efficient for this purpose, and to be able to generate distributed and graded representations of the data. We characterize the structural conditions (sparsity of the weights, low effective temperature, nonlinearities in the activation functions of hidden units, and adaptation of fields maintaining the activity in the visible layer) allowing RBM to operate in such a compositional phase. Evidence is provided by the replica analysis of an adequate statistical ensemble of random RBMs and by RBM trained on the handwritten digits dataset MNIST.Recent years have witnessed major progresses in supervised machine learning, e.g. in video, audio, image processing,... [1]. Despite those impressive successes, unsupervised learning, in which the structure of data is learned without a priori knowledge still presents formidable challenges. A fundamental question is how to learn probability distributions that fit well complex data manifolds in high-dimensional spaces [2]. Once learnt, such generative models can be used for denoising, completion, artificial data generation,... Hereafter we focus on one important generative model, Restricted Boltzmann Machines (RBM) [3, 4]. In its simplest formulation a RBM is a Boltzmann machine on a bipartite graph, see Fig. 1(a), with a visible (v) layer that represents the data, connected to a hidden (h) layer meant to extract and explain their statistical features. The marginal distribution over the visible layer is fitted to the data through approximate likelihood maximization [5][6][7][8]. Once the parameters are trained each hidden unit becomes selectively activated by a specific data feature; owe to the bidirectionality of connections the probability to generate configurations of the visible layer where this feature is present is, in turn, increased. Multiple combinations of numbers of features, with varying degrees of activation of the corresponding hidden units allow for efficient generation of a large variety of new data samples. However, the existence of such 'compositional' encoding seems to depend on the values of the RBM parameters, such as the size of the hidden layer [9]. Characterizing the conditions under which RBM can operate in this compositional regime is the purpose of the present work.In the RBM shown in Fig. 1(a) the visible layer includes N units v i , with i = 1, . . . , N , chosen here to be binary (= 0, 1). Visible units are connected to M hidden units h µ , through the weights {w iµ }. The energy of a configuration v = {v i }, h = {h µ } is defined throughwhere U µ is a potential acting on hidden unit µ; due 2 ; U Ber (h) = h θB if h = 0 or 1, and +∞ otherwise; U ReLU (h) = h 2 2 + h θ for h ≥ 0, +∞ for h < 0. (c) The three regimes of operation, see text. Black, grey and white hidden units symbolize, respectively, strong, weak and null activations....