Real life stochastic problems are generally large-scale, difficult to model, and therefore, suffer from the curses of dimensionality. Such problems cannot be solved by classical optimization methods. This paper presents a reinforcement learning algorithm using a fuzzy inference system, ANFIS to find an approximate solution for semi Markov decision problems (SMDPs). The performance of the developed algorithm is measured and compared to a classical reinforcement algorithm, SMART in a numerical example. Our numerical examples show that the developed algorithm converges significantly faster as the problem size increases and the average cost calculated by the algorithm gets closer to that of SMART as number of epochs used in the developed algorithm is increased. a 1 1,0,0,0 0,1,0,0 1,1,0,0 0,2,0,0 0,0,1,0 0,3,0,0 0,1,1,0 0,4,0,0 0,2,1,0 0,0,0,1 a 2 2,0,0,0 3,0,0,0 2,1,0,0 1,2,0,0 1,0,1,0 1,3,0,0 1,1,1,0 1,4,0,0 0,0,2,0 a 3 4,0,0,0 3,1,0,0 2,2,0,0 2,0,1,0 2,3,0,0 2,1,1,0 0,5,0,0 a 4 5,0,0,0 4,1,0,0 3,2,0,0 3,0,1,0 3,3,0,0 1,2,1,0 a 5 6,0,0,0 5,1,0,0 4,2,0,0 4,0,1,0 2,4,0,0 a 6 7,0,0,0 6,1,0,0 5,2,0,0 3,1,1,0 a 7 8,0,0,0 7,1,0,0 4,3,0,0 a 8 9,0,0,0 5,0,1,0 a 9 6,2,0,0 a 10 8,1,0,0 a 11 10,0,0,0 Fig. 3. Average cost per unit time of the policies determined by the algorithms (Problem 1).