Abstract-The abilities to improve teaching strategies online is important for an intelligent tutoring system (ITS) to perform adaptive teaching. Reinforcement learning (RL) may help an ITS obtain the abilities. Conventionally, RL works in a Markov decision process (MDP) framework. However, to handle uncertainties in teaching/studying processes, we need to apply the partially observable Markov decision process (POMDP) model in building an ITS. In a POMDP framework, it is difficult to use the improvement algorithms of the conventional RL because the required state information is unavailable. In our research, we have developed a reinforcement learning technique, which enables a POMDP-based ITS to learn from its teaching experience and improve teaching strategies online.Index Terms-Computer supported education, intelligent tutoring system, reinforcement learning, partially observable Markov decision process.