In the fifth-generation (5G) mobile networks, proactive network optimisation plays an important role in meeting the exponential traffic growth, more stringent service requirements, and to reduce capital and operational expenditure. Proactive network optimisation is widely acknowledged as one of the most promising ways to transform the 5G network based on big data analysis and cloud-fog-edge computing, but there are many challenges. Proactive algorithms will require accurate forecasting of highly contextualised traffic demand and quantifying the uncertainty to drive decision making with performance guarantees. Context in Cyber-Physical-Social Systems (CPSS) is often challenging to uncover, unfolds over time, and even more difficult to quantify and integrate into decision making. The first part of the review focuses on mining and inferring CPSS context from heterogeneous data sources, such as online user-generated-content. It will examine the state-of-the-art methods currently employed to infer location, social behaviour, and traffic demand through a cloud-edge computing framework; combining them to form the input to proactive algorithms. The second part of the review focuses on exploiting and integrating the demand knowledge for a range of proactive optimisation techniques, including the key aspects of load balancing, mobile edge caching, and interference management. In both parts, appropriate state-of-the-art machine learning techniques (including probabilistic uncertainty cascades in proactive optimisation), complexity-performance trade-offs, and demonstrative examples are presented to inspire readers. This survey couples the potential of online big data analytics, cloud-edge computing, statistical machine learning, and proactive network optimisation in a common cross-layer wireless framework. The wider impact of this survey includes better cross-fertilising the academic fields of data analytics, mobile edge computing, AI, CPSS, and wireless communications, as well as informing the industry of the promising potentials in this area.INDEX TERMS Online data, data analytics, proactive network optimisation, 5G.