Despite of the proven effectiveness, many Michigan learning classifier systems cannot perform multi-step reinforcement learning in continuous spaces. To meet this technical challenge, some learning classifier systems have been designed to learn fuzzy logic rules. They can be largely classified into strength-based and accuracy-based systems. The latter is gaining more research attention in the last decade. However existing accuracy-based learning systems either address primarily singlestep learning problems or require the action space to be discrete. In this paper, a new accuracy-based learning fuzzy classifier system is developed to explicitly handle continuous state input and continuous action output during multi-step reinforcement learning. Several technical improvements have been achieved while developing the new learning algorithm. Particularly, we have successfully extended Q-learning like credit assignment methods to continuous spaces. To enable direct learning of stochastic strategies for action selection, we have also proposed to use a new fuzzy logic system with stochastic action outputs. Moreover, fine-grained learning of fuzzy rules has been achieved effectively in our algorithm by using a natural gradient learning method. It is the first time for these techniques to be utilized substantially in any accuracy-based learning fuzzy classifier systems. Meanwhile, in comparison with several recently proposed learning algorithms, our algorithm is shown to perform highly competitively on four benchmark learning problems and a robotics problem. The practical usefulness of our algorithm is also demonstrated by improving the performance of a wireless body area network.