Many researchers and practitioners in the recent years have been attracted to investigate the role of uncertainties in the supply chain management concept. In this paper a multi-period stochastic supply chain with demand uncertainty and supplier disruption is modeled. In the model, two types of retailers including risk sensitive and risk neutral, with many capacitated suppliers are considered. Autonomous retailers have three choices to satisfy demands: ordering from primary suppliers, reserved suppliers and spot market. The goal is to find the best behavior of the risk sensitive retailer, regarding the forward and option contracts, during several contract periods based on the profit function. Hence, an agent-based simulation approach has been developed to simulate the supply chain and transactions between retailers and unreliable suppliers. In addition, a Q-learning approach (as a method of reinforcement learning) has been developed to optimize the simulation procedure. Furthermore, different configurations for simulation procedure are analyzed. The R-netlogo package is used to implement the algorithm. Also a numerical example has been solved using the proposed simulation-optimization approach. Several sensitivity analyzes are conducted regarding different parameters of the model. Comparison of the numerical results with a genetic algorithm shows a significant efficiency of the proposed Q-leaning approach.Keywords Supply chain management, simulation based optimization, reinforcement learning, demand uncertainty, supplier disruption.
IntroductionThe Importance of uncertainties and consequent costs of ignoring them, caused a shift from deterministic configurations of the supply chain to the stochastic models. One of the most important problems in the stochastic supply chain ordering management is the newsvendor (NV) problem. The basic form of the NV problem consists of a buyer and a seller in which the buyer must decide on the amount of ordering from the seller while demand of the customers is not predetermined. In the basic form, the buyer only has an overall information about the customer demand such as the distribution function. Also the decision is made only at one period. The objective is to optimize the profit of the buyer. Two extensions of the problem have been done by the researchers: the multi-period NV Problem (MNVP), the NV Problem with Supplier Disruption (NVPSD). In the MNVP, the buyer(s) decides on the amount of ordering from the seller(s) at the beginning of each period. The buyer(s) decides on the amount of orders based on the uncertain demands of its customers and the inventory remained from the previous period. In the NVPSD (which is often consisted of one period) the buyer(s) decides on the amount of orders based on the uncertain customer demand and remained fixed capacities of the sellers. In the related literature of the NVPSD it is usually assumed that the network consists of many uncertain sellers and one buyer (e.g. [1], [2], [3]). On the other hand, in the literature of the MNVP, ...