Bonus-Malus System (BMS) is a risk managing method mostly used in liability insurances. The most general application of the BMS is in the Motor third-party liability insurance. In a BMS, there are finitely many classes, each having a different premium. At the start of the contract, each policyholder is assigned to the "initial class". Subsequently, suppose the policyholder has a claim in the following period. In that case, he/she moves to a worse class, so the policyholder's payment may increase in the subsequent period. If he/she does not have a claim in a particular period, then he/she moves to a better class; therefore, his/her payment may become less in the following period. The classification rule – how many classes the policyholder will move up or down in the system – is called the transition rule. Hence, a transition rule specifies where the policyholder will be reclassified in the subsequent period for each possible claim. Our contribution to the literature of the optimization of the BMS can be summarized as: • We investigate a model that was introduced by Heras et al. (2004) but with a modified objective function. We proved that an optimal premium-scale always exists with this objective function in which all premiums equal one of the risk groups expected claim. • We considered the same model with a profit constraint. In this case, we proved that an optimal premium-scale always exists in which there is only one type of premium that is unequal to any risk group's expected claim. • We introduced a MILP model for the optimization of transition rules with fixed premiums. We considered unified and non-unified transition rules optimization. In the case of unified transition rules, we gave the rule to exclude those transition rules that would lead to a non-irreducible Markov chain. • We introduced a MILP model for the joint optimization of transition rules and premiums. We can determine the exact solution with the investigated objective function when we do not consider the profit constraint. However, we can only approximate it otherwise. • We introduced an extended version of the model, where instead of the stationary probabilities, we use multi-period optimization. • We introduced modeling approaches to consider the BMS premium with other statistical estimations in the final premium. Finally, we compared the methods with numerical experiments on realistic data. • We introduced an optimization model for a BMS where the classification depends on the claim amount.