Learning Classifier Systems (LCSs) are rule-based adaptive systems that have both Reinforcement Learning (RL) and rule-discovery mechanisms for effective and practical online learning. An analysis of the reinforcement process of XCS, one of the currently mainstream LCSs, is performed from the aspect of RL. Upon comparing XCS's update method with gradient-descent-based parameter update in RL, differences are found in the following elements: (1) residual term, (2) gradient term, and (3) payoff definition. All possible combinations of the variants in each element are implemented and tested on multi-step benchmark problems. This revealed that few specific combinations work effectively with XCS's accuracy-based rule-discovery process, while pure gradient-descent-based update showed the worst performance.