Maintenance has always been a crucial aspect of the manufacturing and industrial sectors. There is a recent surge of interest in utilizing advanced machine learning, and reinforcement learning models to enhance maintenance strategies. In this regard, this paper focuses on the development of a joint optimization model of maintenance and production for a special type of production system that has an adjustable production rate, where the system's deterioration is closely related to the production rate. When the production rate is increased, the expected deterioration of the system also increases. To control the deterioration of the system, the paper proposes two main actions or policies: maintenance policy and production policy. These policies involve scheduling and conducting maintenance actions on the system and adjusting the production rate, respectively. To solve the optimization problem of minimizing the expected costs of the system during a finite planning horizon, the paper develops a Markov decision process and employs reinforcement learning algorithms such as Q‐learning and SARSA. The hyperparameters of the algorithms are tuned using a value‐iteration algorithm of dynamic programming. The developed optimal actions given the state of the system ensure efficient management of the production system while controlling the deterioration of the system.