In the 5G core and the upcoming 6G core, the User Plane Function (UPF) is responsible for the transportation of data from and to subscribers in Protocol Data Unit (PDU) sessions. The UPF is generally implemented in software and packed into either a virtual machine or container that can be launched as a UPF instance with a specific resource requirement in a cluster. To save resource consumption needed for UPF instances, the number of initiated UPF instances should depend on the number of PDU sessions required by customers, which is often controlled by a scaling algorithm. In this paper, we investigate the application of Deep Reinforcement Learning (DRL) for scaling UPF instances that are packed in the containers of the Kubernetes container-orchestration framework. We propose an approach with the formulation of a threshold-based reward function and adapt the proximal policy optimization (PPO) algorithm. Also, we apply a support vector machine (SVM) classifier to cope with a problem when the agent suggests an unwanted action due to the stochastic policy. Extensive numerical results show that our approach outperforms Kubernetes's built-in Horizontal Pod Autoscaler (HPA). DRL could save 2.7-3.8% of the average number of Pods, while SVM could achieve 0.7-4.5% saving compared to HPA.