Our thoughts are inherently dynamic, often wandering far from our current situation. This unintentional transition of thought contents, called mind wandering (MW), is crucial for understanding the nature of human thought. Although previous research has identified environmental and individual factors influencing MW, a comprehensive framework that integrates these findings remains absent. This study modeled the framework of MW by applying the idea of homeostasis to action selection and replicated various findings of MW research through simulations. We trained a homeostatic reinforcement learning (HRL) model, in which an independent drive for the task-related and other actions are assigned, and a drive reductive action is rewarded with sustained attention to the response task. The results showed that the change in the response time to stimulus during MW and the proportion of MW were replicated successfully, aligning with previous studies by manipulating environment and model parameters, suggesting that the model accurately captures the underlying mechanism of MW. Finally, we discuss the commonality between human thought and animal behavior and the possibility that the control of these phenomena shares the basic principle based on homeostasis.