Since December 2019, the new coronavirus has raged in China and subsequently all over the world. From the first days, researchers have tried to discover vaccines to combat the epidemic. Several vaccines are now available as a result of the contributions of those researchers. As a matter of fact, the available vaccines should be used in effective and efficient manners to put the pandemic to an end. Hence, a major problem now is how to efficiently distribute these available vaccines among various components of the population. Using mathematical modeling and reinforcement learning control approaches, the present article aims to address this issue. To this end, a deterministic Susceptible-Exposed-Infectious-Recovered-type model with additional vaccine components is proposed. The proposed mathematical model can be used to simulate the consequences of vaccination policies. Then, the suppression of the outbreak is taken to account. The main objective is to reduce the effects of Covid-19 and its domino effects which stem from its spreading and progression. Therefore, to reach optimal policies, reinforcement learning optimal control is implemented, and four different optimal strategies are extracted. Demonstrating the efficacy of the proposed methods, finally, numerical simulations are presented.