Video streaming on mobile devices is prone to a multitude of faults and although well established video Quality of Experience (QoE) metrics such as stall frequency are a good indicator of the problems perceived by the user, they do not provide any insights about the nature of the problem nor where it has occurred. Quantifying the correlation between the aforementioned faults and the users' experience is a challenging task due the large number of variables and the numerous points-of-failure.To address this problem, we developed a framework for diagnosing the root cause of mobile video QoE issues with the aid of machine learning. Our solution can take advantage of information collected at multiple vantage points between the video server and the mobile device to pinpoint the source of the problem. Moreover, our design works for different video types (e.g., bitrate, duration, ..) and contexts (e.g., wireless technology, encryption, ..) After training the system with a series of simulated faults in the lab, we analyzed the performance of each vantage point separately and when combined, in controlled and real world deployments. In both cases we find that the involved entities can independently detect QoE issues and that only a few vantage points are required to identify a problem's location and nature.