In this paper, efficient real time control strategies are devised for systems with unknown state equation, based only on a set of data inherited from non-optimized, possibly inefficient, operation of the system, in the case in which experimenting online with the latter is impossible or costly. Neural networks and kernel smoothing models are employed as architectures for learning the system dynamics. The former require an offline training phase to learn the state equation, whereas the latter exploit the available data in a direct fashion, thus making the proposed approach directly applicable online and able to exploit new available data without the need of an offline training. Convergence properties of the proposed algorithm for generating the control strategies are provided under suitable hypotheses. Simulation results on classic benchmark systems are reported for performance evaluation, also through a comparison with the SARSA reinforcement learning algorithm.