This paper constructs an operant conditioning learning system based on fuzzy and probabilistic automata, which used for on-line self-learning of fuzzy rules. The learning system can learn its rules on line by interaction with environment, and achieve the best rule consequent. The probability cans grantee the global superiority of learning mechanism. The fuzzy inference can improve the robustness and rapidity and of learning. Furthermore, we adopt two fuzzy control structure in order to avoid rule explosion problem that mean the rules will increase in exponent which induced by the more number of input variable, which can predigest difficulty of design. We apply our model to inverted pendulum selfbalance control and the simulation indicate: the designed operant conditioning learning system can realize the selflearning of fuzzy rules, and it especially has outstanding superiority for dealing with lack of prior knowledge.