Purpose: The study intends to model an interest rate index option using a heavy-tailed distribution. The goal is to calculate the interest rate path-dependent option prices that are consistent with market data and to develop a reinforcement learning strategy to discretely hedge the position considering transaction costs. Methodology: This paper presents a mathematical framework to calculate the price of interest rate path-dependent options. The research adapted a Fourier cosine series formula to employ the characteristic function of the present value of the forward index, which is modeled as a variance-gamma process and uses deep Q-learning to hedge such options. Findings: There is market evidence that the implied volatility curve is not flat. The study demonstrated that the variance-gamma process generates an increasing volatility smile, which is consistent with market observations. Additionally, hedging results show that the path-dependent options generated from the variance-gamma process can be efficiently hedged with advanced Q-learning techniques. Research limitations/implications: The study comprised only the variance-gamma process. Other probability distributions, such as the Normal Inverse Gaussian model, should be investigated. Practical implications: This study reveals which type of probability distribution should be present in a pricing engine to be consistent with implied volatilities. The approach provided here can assist managers in evaluating and comprehending market pricing behavior as well as achieving discrete hedging with costs. Originality: The paper addressed the merging of a fast pricing method for the interest rate options with a heavy-tailed distribution and the discrete interest rate derivatives hedging with reinforcement learning.