In-band full-duplex communication has the potential to double the wireless channel capacity. However, how to efficiently transform the full-duplex gain at the physical layer into network throughput improvement is still a challenge, especially in dynamic communication environments. This paper presents a reinforcement learning-based full-duplex (RLFD) medium access control (MAC) protocol for wireless local-area networks (WLANs) with full-duplex access points. To solve the channel contention problem and fully utilize the full-duplex transmission opportunities, we first design a two-way handshake transmission mechanism and make an investigation on the effects of transmission scheduling in full-duplex transmission. Then the transmission scheduling problem is theoretically formulated as a non-stationary multi-armed bandit problem in which our objective is to maximize the network throughput. Thus, we develop a Window-Constraint Bayesian (WCB) algorithm to generate optimized scheduling policies online. And full-duplex opportunities are fully utilized by transmitting packets according to the optimized scheduling policies. Besides, an analytical model is developed to characterize the performance of RLFD. The performance of RLFD is evaluated by simulation. And the results show that RLFD can improve the network throughput by 80% compared with the IEEE 802.11 distributed coordination function with Request-To-Send/Clear-To-Send. Moreover, with the proposed WCB algorithm, the network throughput can remain steady as the communication environment dynamically changes.