With the development of robot technology, we can expect selfpropelled robots working in large areas where cooperative and coordinated behaviors by multiple (hardware and software) robots are necessary. However, it is not trivial for agents, which are control programs running on robots, to determine the actions for their cooperative behaviors, because such strategies depend on the characteristics of the environment and the capabilities of individual agents. Therefore, using the example of continuous cleaning tasks by multiple agents, we propose a method of meta-strategy that decide the appropriate planning strategies for cooperation and coordination through with the learning of the performance of individual strategies and the environmental data in a multi-agent systems context, but without complex reasoning for deep coordination due to the limited CPU capability and battery capacity. We experimentally evaluated our method by comparing it with a conventional method that assumes that agents have knowledge on where agents visit frequently (since they are easy to become dirty). We found that agents with the proposed method could operate as effectively as and, in complex areas, outperformed those with the conventional method. Finally, we describe that the reasons for such a counterintuitive phenomenon is induced from splitting up in working by autonomous agents based on the local observations. We also discuss the limitation of the current method.