Mobile video applications are becoming increasingly prevalent and enriching the way people learn and are entertained. However, on mobile terminals with inherently limited resources, mobile video streaming services consume too much energy and bandwidth, which is an urgent problem to solve. At present, research on cost-effective mobile video streaming typically focuses on the management of data transmission. Among such studies, some new approaches consider the user’s behavior to further optimize data transmission. However, these studies have not adequately discussed the specific impact of the physical environment on user behavior. Therefore, this paper takes into account the environment-aware watching state and proposes a cost-effective mobile video streaming scheme to reduce power consumption and mobile data usage. First, the watching state is predicted by machine learning based on user behavior and the physical environment during a given time window. Second, based on the resulting prediction, a downloading algorithm is introduced based on the user equipment (UE) running mode in the LTE system and the VLC player. Finally, according to the corresponding experimental results obtained in a real-world environment, the proposed approach, compared to its benchmarks, effectively reduces the data usage (14.4% lower than that of energy-aware, on average) and power consumption (about 19% when there are screen touches) of mobile devices.