The use of multipath routing in overlay networks is a promising solution to improve performance and availability of Internet applications, without the replacement of the existing TCP/IP infrastructure. In this paper, we propose an approach to distribute data over multiple overlay paths that is able to improve Quality of Service (QoS) metrics, such as the data transfer time, loss, and throughput. By using the Imbedded Markov Chain technique, we demonstrate that the system under analysis, observed at specific instants, possesses the Markov property. We therefore cast the data distribution problem into the Markov Decision Process (MDP) framework, and design a computationally efficient algorithm named Online Policy Iteration (OPI), to solve the optimization problem on the fly. The proposed approach is applied to the problem of multipath data distribution in various wired/wireless network scenarios, with the objective of minimizing the data transfer time as well as the delay and losses. Through both intensive ns-2 simulations with data collected from real heterogeneous networks and experiments over real networks, we show the superior performance of the proposed traffic control mechanism in comparison with two classical schemes, that are Weighted Round Robin and Join the Shortest Queue.