Face alignment consists of aligning a shape model on a face image. It is an active domain in computer vision as it is a preprocessing for a number of face analysis and synthesis applications. Current state-of-the-art methods already perform well on "easy" datasets, with moderate head pose variations, but may not be robust for "in-the-wild" data with poses up to 90 • . In order to increase robustness to an ensemble of factors of variations (e.g. head pose or occlusions), a given layer (e.g. a regressor or an upstream CNN layer) can be replaced by a Mixture of Experts (MoE) layer that uses an ensemble of experts instead of a single one. The weights of this mixture can be learned as gating functions to jointly learn the experts and the corresponding weights. In this paper, we propose to use tree-structured gates which allows a hierarchical weighting of the experts (Tree-MoE). We investigate the use of Tree-MoE layers in different contexts in the frame of face alignment with cascaded regression, firstly for emphasizing relevant, more specialized feature extractors depending of a high-level semantic information such as head pose (Pose-Tree-MoE), and secondly as an overall more robust regression layer. We perform extensive experiments on several challenging face alignment datasets, demonstrating that our approach outperforms the state-of-the-art methods.