This piece is a follow-up of the research started by the authors on the constrained optimal control problem applied to pollution accumulation. We consider a dynamic system governed by a diffusion process with multiple modes that depends on an unknown parameter. We will study the components of the model and their restrictions and propose a scheme to solve the problem in which it is possible to determine (adaptive) policies that maximize a suitable discounted reward criterion using standard dynamic programming techniques in combination with discrete estimation methods for the unknown parameter. Finally, we develop a numerical example to illustrate our results with a particular case of the method of minimum least square error approximation.