In many real-world tasks, acquiring features requires a certain cost, which gives rise to the costly features classification problem. In this study, We formulate the problem in the reinforcement learning framework and sequentially select the subset of features to make a balance between the classification error and the feature cost. Specifically, advantage actor critic algorithm is firstly used to solve it. Furthermore, to improve the learned policy and make it explainable, we employ the Monte Carlo Tree Search to update the policy iteratively. During the procedure, we also consider its performance on imbalanced datasets. Our empirical evaluation shows that our method performs well in comparison with other traditional methods.