Acupuncture treatment (AT) of depressive insomnia by traditional Chinese medicine has the advantages of fewer side effects, quicker results, and lower prices compared to medication and psychological and cognitive therapy. Clinicians often select multiple acupoints, such as Bai Hui (GV20), San Yin Jiao (SP6), and Shen Men (HT7), for combined treatment in a single AT session to improve sleep quality. Since the ancient literature on AT often only records the general order of acupoints, there needs to be more discussion on the influence of the multiple acupoint sequence on the priority of efficacy for a specific disease. At the same time, determining the ranking of acupoints in-patient treatment in clinical practice is mainly dependent on the treatment experience of practitioners, and there is no transparent quantitative model or evaluation method for generating credible acupoint sequences from a small and limited scale of cases. Therefore, it is essential to explore the optimization of the order of multiple acupoints in treating depressive insomnia by Traditional Chinese Medicine acupuncture both for the symptom relief of depressive insomnia patients and for the efficient use of national health care resources. This paper proposes a reinforcement learning-based method for optimizing the acupoint sequence for depressive insomnia AT to address these issues. This paper provides a post-AT EEG signal prediction model with related interpretable models to construct a reinforcement learning framework to represent the state transfer of the AT environment and a quantitative EEG signal-based AT efficacy model to represent the reward function. Finally, 30 patients with depressive insomnia were recruited to collect EEG signals during AT for depressive insomnia, and the case data were used to quantify the efficacy of AT and to model the post-AT EEG signal prediction. The above two models were applied to optimize the acupoint sequence based on reinforcement learning. Satisfactory results were obtained, verifying the effectiveness and feasibility of the method proposed in this paper.