Well-established databases and benchmarks have been developed in the past 20 years for automatic facial behaviour analysis. Nevertheless, for some important problems regarding analysis of facial behaviour, such as (a) estimation of affect in a continuous dimensional space (e.g., valence and arousal) in videos displaying spontaneous facial behaviour and (b) detection of the activated facial muscles (i.e., facial action unit detection), to the best of our knowledge, well-established in-the-wild databases and benchmarks do not exist. That is, the majority of the publicly available corpora for the above tasks contain samples that have been captured in controlled recording conditions and/or captured under a very specific milieu. Arguably, in order to make further progress in automatic understanding of facial behaviour, datasets that have been captured in in-the-wild and in various milieus have to be developed. In this paper, we survey the progress that has been recently made on understanding facial behaviour inthe-wild, namely the datasets and methodologies that have been developed thus far, while paying particular attention to recently proposed deep learning techniques. Finally, we attempt a significant step further by proposing a novel, comprehensive benchmark that can be utilized for evaluating and training various methodologies for the problems of facial affect, behaviour analysis and understanding "inthe-wild". To the best of our knowledge, this is the first benchmark proposed for measuring continuous affect in the valence-arousal space "in-the-wild".