This paper studies the packet scheduling problem in Broadband Wireless Access (BWA) networks. The key difficulties of the BWA scheduling problem lie in the high variability of wireless channel capacity and the unknown model of packet arrival process. It is difficult for traditional heuristic scheduling algorithms to handle the situation and guarantee satisfying performance in BWA networks. In this paper, we introduce learning-based approach for a better solution. Specifically, we formulate the packet scheduling problem as an average cost Semi-Markov Decision Process (SMDP). Then, we solve the SMDP by using reinforcement learning. A feature-based linear approximation and the Temporal-Difference learning technique are employed to produce a near optimal solution of the corresponding SMDP problem. The proposed algorithm, called Reinforcement Learning Scheduling (RLS), has in-built capability of self-training. It is able to adaptively and timely regulate its scheduling policy according to the instantaneous network conditions. Simulation results indicate that RLS outperforms two classical scheduling algorithms and simultaneously considers: (i) effective QoS differentiation, (ii) high bandwidth utilization, and (iii) both short-term and longterm fairness.