This paper presents a two stage learning technique that combines a particle swarm optimization (PSO)-based fuzzy logic control (FLC) algorithm with the Q-Learning fuzzy inference system (QFIS) algorithm. The PSO algorithm is used as a global optimizer to autonomously tune the parameters of a fuzzy logic controller. On the other hand, the QFIS algorithm is used as a local optimizer. We simulate mobile robots playing the differential form of the pursuit evasion game. The game is played such that the pursuer should learn its default control strategy on-line by interacting with the evader. We assume that the evader plays a well defined strategy which is to run away along the line of sight. The pursuer's learning process depends on the rewards received from its environment. The proposed technique is compared through simulation with the default control strategy, the PSO-based fuzzy logic control algorithm, and the QFIS algorithm. Simulation results show that the proposed learning technique outperform the PSO-based fuzzy logic control algorithm and the QFIS algorithm with respect to the learning time which represents an important factor in on-line applications.