The most frequently used maximum power point tracking (MPPT) technique is the perturb & observe (P&O) algorithm as a power tracking tool in PV system. The P&O algorithm is easy to compute and implement, but the algorithm is prone to oscillations at the maximum power point. Hence, the power becomes inaccurate due to a lot of power loss. This study utilizes the deep q-network (DQN) algorithm to improve P&O performance by correcting the output value algorithm using various step sizes by DQN. The proposed method increased the tracking speed when receiving the same value at different times by 33.3%-50%, and the oscillation rate was successfully reduced by 73.99%-83.5%. The advantages of increasing tracking speed and decreasing oscillation rate are accompanied by tracked power with averages of 95%, which is better than the P&O and DQN algorithms. It shows that the proposed method can work optimally regarding efficiency and oscillation rate and be the fastest in tracking maximum power from previous related works.