In the information society, data explosion has led to more congestion in the core network, dampening the network performance. Random early detection (RED) is currently the standard algorithm for active queue management (AQM) recommended by the Internet Engineering Task Force (IETF). However, RED is particularly sensitive to both service load and algorithm parameters. The algorithm cannot fully utilize the bandwidth at a low service load, and might suffer a long delay at a high service load. This paper designs the reinforcement learning AQM (RLAQM), a simple and practical variant of RED, which controls the average queue length to the predictable value under various network loads, such that the queue size is no longer sensitive to the level of congestion. Q-learning was adopted to adjust the maximum discarding probability, and derive the optimal control strategy. Simulation results indicate that RLAQM can effectively overcome the deficiency of RED and achieve better congestion control; RLAQM improves the network stability and performance in complex environment; it is very easy to migrate from RED to RLAQM on Internet routers: the only operation is to adjust the discarding probability.