Convolutional Neural Networks (CNNs) have been widely applied in various edge computing devices based on intelligent sensors. However, due to the high computational demands of CNN tasks, the limited computing resources of edge intelligent terminal devices, and significant architectural differences among these devices, it is challenging for edge devices to independently execute inference tasks locally. Collaborative inference among edge terminal devices can effectively utilize idle computing and storage resources and optimize latency characteristics, thus significantly addressing the challenges posed by the computational intensity of CNNs. This paper targets efficient collaborative execution of CNN inference tasks among heterogeneous and resource-constrained edge terminal devices. We propose a pre-partitioning deployment method for CNNs based on critical operator layers, and optimize the system bottleneck latency during pipeline parallelism using data compression, queuing, and “micro-shifting” techniques. Experimental results demonstrate that our method achieves significant acceleration in CNN inference within heterogeneous environments, improving performance by 71.6% compared to existing popular frameworks.