With the continuous evolution and in‐depth integration between wireless communication and emerging technology such as internet of things (IoT), artificial intelligence (AI) etc., wireless terminals are growing exponentially, thus bringing great challenges to available spectrum resources. The contradiction between unlimited frequency needs and limited spectrum resources has become a bottleneck restricting the development of wireless communication technology. As an efficient way to improve spectrum efficiency, cognitive radio (CR) continues to be the focus of wireless communication within decades. To conduct CR, the main procedure is the discovery of available spectral holes by periodically monitoring the target authorized band, namely spectrum sensing (SS). Energy detector (ED) is widely accepted for SS due to its low complexity and high convenience. The essence of traditional ED based SS schemes consist in the adaptive variation of sensing threshold/sampling point with environmental signal‐to‐noise ratio (SNR) at the receiver of CR terminal, namely adaptive sensing threshold/sampling point based SS. However, the performance of both adaptive sensing threshold and adaptive sampling point based SS schemes are always at the expense of computation complexity due to the excessive sampling point. In addition, these two schemes are both about the optimization issue of a single variable under constraints. Actually, both detection probability and false alarm probability of ED are a two‐dimensional function of sensing threshold and sampling point for a given SNR. The optimal solution of sensing performance can not be obtained by optimizing sensing threshold or sampling point alone. Motivated by these, the joint optimization of sampling point and sensing threshold is considered for SS in this paper, where sampling point and sensing threshold are jointly adaptive with the variation of environmental SNR. In addition, Q‐learning is considered in this paper to obtain the sub‐optimal solution due to the non‐convexity of the considered optimization problem. Finally, the simulation experiments are made and the results validate the effectiveness of the proposed scheme.