Caching at mobile edge servers can smooth temporal traffic variability and reduce service load of base stations in mobile video delivery. However, the assignment of multiple video representations to distributed servers is still a challenging question in the context of adaptive streaming, since any two representations from different videos or even from the same video will compete for the limited caching storage. It is therefore important, yet challenging, to optimally select the cached representations for each edge server in order to effectively reduce the service load of base station while maintaining a high quality of experience (QoE) for users. To address this, we study on a QoE-driven mobile edge caching placement optimization problem for dynamic adaptive video streaming that properly takes into account the different ratedistortion (R-D) characteristics of videos and the coordination among distributed edge servers. Then, by the optimal caching placement of representations for multiple videos, we maximize the aggregate average video distortion reduction of all users while minimizing the additional cost of representation downloading from the base station, subject not only to the storage capacity constraints at the edge servers, but also to the transmission and initial startup delay constraints at the users. We formulate the proposed optimization problem as an integer linear program (ILP) to provide the performance upper bound, and as a submodular maximization problem with a set of knapsack constraints to develop a practically feasible cost benefit greedy algorithm. The proposed algorithm has polynomial computational complexity and a theoretical lower bound on its performance. Simulation results further show that the proposed algorithm is able to achieve a near-optimal performance with very low time complexity. Therefore, the proposed optimization framework reveals the caching performance upper bound for general adaptive video streaming systems, while the proposed algorithm provides some design guidelines for the edge servers to select the cached representations in practice based on both the video popularity and content information.