In wireless communication systems, reliability, low latency and power are essential in large scale multi-hop environment. Multi-hop based cooperative communication is an efficient way to achieve goals of wireless networks. This paper proposes a relay selection scheme for reliable transmission by selecting an optimal relay. The proposed scheme uses a signal-to-noise ratio (SNR) based Q-learning relay selection scheme to select an optimal relay in multi-hop transmission. Q-learning consists of an agent, environment, state, action and reward. When the learning is converged, the agent learns the optimal policy which is a rule of the actions that maximize the reward. In other words, the base station (BS) knows the optimal relay to select and transmit the signal. At this time, the cooperative communication scheme used in this paper is a decode-and-forward (DF) scheme in orthogonal frequency division multiplexing (OFDM) system. The Q-learning in the proposed scheme defines an environment to maximize a reward which is defined as SNR. After the learning process, the proposed scheme finds an optimal policy. Furthermore, this paper defines a reward which is based on the SNR. The simulation results show that the proposed scheme has the same bit error rate (BER) performance as the conventional relay selection scheme. However, this paper proposes an advantage of selecting fewer relays than conventional scheme when the target BER is satisfied. This can reduce the latency and the waste of resources. Therefore, the performance of the multi-hop transmission in wireless networks is enhanced.