After the traditional water equipment integrates the communication module, IoT (Internet of Things) device is formed. Whether these battery-powered IoT devices can be installed in a certain location depends on whether the power consumption of these IoT devices in these locations can meet the expected life cycle. In this paper, by adopting strategies to save the power consumption of IoT devices when sending data, more locations can be selected to install IoT devices. The process of IoT device sending data packet sequence needs to be aware of the environment, interact with the environment, then make a decision, and then adjust the policy according to the effect of the action. Therefore, in this paper, the process of IoT device sending data packet sequence is modelled as MDP (Markov Sequence Decision Process), and the real-time SINR of channel and the transmission delay of data packet sequence are defined as the state space, and the action space consists of immediate transmission and delayed transmission, with the minimum total power consumption as the objective function. Because IoT devices are very sensitive to power consumption and cannot collect a large amount of data for training, this paper uses the Proximal Policy Optimization algorithm based on prior distribution to conduct few-shot reinforcement learning to quickly obtain the optimal decision sequence of layout and location of IoT devices.