The paper proposes a novel approach to adaptive selection of sample size for a trial solution of an evolutionary algorithm when noise of unknown distribution contaminates the objective surface. The sample size of a solution here is adapted based on the noisy fitness profile in the local surrounding of the given solution. The fitness estimate and the fitness variance of a sub-population surrounding the given solution are jointly used to signify the degree of noise contamination in its local neighborhood (LN). The adaptation of sample size based on the characteristics of the fitness landscape in the LN of a solution is realized here with the temporal difference Q-learning (TDQL). The merit of the present work lies in utilizing the reward-penalty based reinforcement learning mechanism of TDQL for sample size adaptation. This sidesteps the prerequisite setting of any specific functional form of relationship between the sample size requirement of a solution and the noisy fitness profile in its LN. Experiments undertaken reveal that the proposed algorithms, realized with artificial bee colony, significantly outperform the existing counterparts and the state-of-the-art algorithms.