Face detection and facial points localization are interconnected tasks. Recently it has been shown that solving these two tasks jointly with a mixture of trees of parts (MTP) leads to state-of-the-art results. However, MTP, as most other methods for facial point localization proposed so far, requires a complete annotation of the training data at facial point level. This is used to predefine the structure of the trees and to place the parts correctly. In this work we extend the mixtures from trees to more general loopy graphs. In this way we can learn in a weakly supervised manner (using only the face location and orientation) a powerful deformable detector that implicitly aligns its parts to the detected face in the image. By attaching some reference points to the correct parts of our detector we can then localize the facial points. In terms of detection our method clearly outperforms the state-of-the-art, even if competing with methods that use facial point annotations during training. Additionally, without any facial point annotation at the level of individual training images, our method can localize facial points with an accuracy similar to fully supervised approaches.