A key challenge for drug dosing schedules is the ability to learn an optimal control policy even when there is a paucity of accurate information about the systems. Artificial intelligence has great potential for shaping a smart control policy for the dosage of drugs for any treatment. Motivated by this issue, in the present research paper a Caputo–Fabrizio fractional-order model of cancer chemotherapy treatment was elaborated and analyzed. A fix-point theorem and an iterative method were implemented to prove the existence and uniqueness of the solutions of the proposed model. Afterward, in order to control cancer through chemotherapy treatment, a fuzzy-reinforcement learning-based control method that uses the State-Action-Reward-State-Action (SARSA) algorithm was proposed. Finally, so as to assess the performance of the proposed control method, the simulations were conducted for young and elderly patients and for ten simulated patients with different parameters. Then, the results of the proposed control method were compared with Watkins’s Q-learning control method for cancer chemotherapy drug dosing. The results of the simulations demonstrate the superiority of the proposed control method in terms of mean squared error, mean variance of the error, and the mean squared of the control action—in other words, in terms of the eradication of tumor cells, keeping normal cells, and the amount of usage of the drug during chemotherapy treatment.