Previous studies on perceptual grouping found that people can use spatiotemporal and featural information to group spatially separated rigid objects into a unit while tracking moving objects. However, few studies have tested the role of objects’ self‐motion information in perceptual grouping, although it is of great significance to the motion perception in the three‐dimensional space. In natural environments, objects always move in translation and rotation at the same time. The self‐rotation of the objects seriously destroys objects’ rigidity and topology, creates conflicting movement signals and results in crowding effects. Thus, this study sought to examine the specific role played by self‐rotation information on grouping spatially separated non‐rigid objects through a modified multiple object tracking (MOT) paradigm with self‐rotating objects. Experiment 1 found that people could use self‐rotation information to group spatially separated non‐rigid objects, even though this information was deleterious for attentive tracking and irrelevant to the task requirements, and people seemed to use it strategically rather than automatically. Experiment 2 provided stronger evidence that this grouping advantage did come from the self‐rotation per se rather than surface‐level cues arising from self‐rotation (e.g. similar 2D motion signals and common shapes). Experiment 3 changed the stimuli to more natural 3D cubes to strengthen the impression of self‐rotation and again found that self‐rotation improved grouping. Finally, Experiment 4 demonstrated that grouping by self‐rotation and grouping by changing shape were statistically comparable but additive, suggesting that they were two different sources of the object information. Thus, grouping by self‐rotation mainly benefited from the perceptual differences in motion flow fields rather than in deformation. Overall, this study is the first attempt to identify self‐motion as a new feature that people can use to group objects in dynamic scenes and shed light on debates about what entities/units we group and what kinds of information about a target we process while tracking objects.