Estimation of the dose-response curve for efficacy and subsequent selection of an appropriate dose in phase II trials are important processes in drug development. Various methods have been investigated to estimate dose-response curves.Generally, these methods are used with equal allocation of subjects for simplicity; nevertheless, they may not fully optimize performance metrics because of nonoptimal allocation. Optimal allocation methods, which include adaptive allocation methods, have been proposed to overcome the limitations of equal allocation. However, they rely on asymptotics, and thus sometimes cannot efficiently optimize the performance metric with the sample size in an actual clinical trial. The purpose of this study is to construct an adaptive allocation rule that directly optimizes a performance metric, such as power, accuracy of model selection, accuracy of the estimated target dose, or mean absolute error over the estimated dose-response curve. We demonstrate that deep reinforcement learning with an appropriately defined state and reward can be used to construct such an adaptive allocation rule. The simulation study shows that the proposed method can successfully improve the performance metric to be optimized when compared with the equal allocation, D-optimal, and TD-optimal methods. In particular, when the mean absolute error was set to the metric to be optimized, it is possible to construct a rule that is superior for many metrics.