In the fifth-generation of mobile communications, network slicing is used to provide an optimal network for various services as a slice. In this paper, we propose a radio access network (RAN) slicing method that flexibly allocates RAN resources using deep reinforcement learning (DRL). In RANs, the number of slices controlled by a base station fluctuates in terms of user ingress and egress from the base station coverage area and service switching on the respective sets of user equipment. Therefore, when resource allocation depends on the number of slices, resources cannot be allocated when the number of slices changes. We consider a method that makes optimal-resource allocation independent of the number of slices. Resource allocation is optimized using DRL, which learns the best action for a state through trial and error. To achieve independence from the number of slices, we show a design for a model that manages resources on a one-slice-by-one-agent basis using ApeX , which is a DRL method. In ApeX , because agents can be employed in parallel, models that learn various environments can be generated through trial and error of multiple environments. In addition, we design a model that satisfies the slicing requirements without overallocating resources. Based on this design, it is possible to optimally allocate resources independently of the number of slices by changing the number of agents. In the evaluation, we test multiple scenarios and show that the mean satisfaction of the slice requirements is approximately 97%. INDEX TERMS Deep reinforcement learning, network slicing, RAN slicing, resource management.