Cooperative Vehicle and Infrastructure System (CVIS) and Autonomous Vehicle (AV) are two mainstream technologies to improve urban traffic efficiency and vehicle safety in the Intelligent Transportation System (ITS). However, there remain significant obstacles that must be overcome before fully unmanned applications are ready for widespread adoption in a transportation system. To achieve fully driverless driving, the perception ability of vehicle should be accurate, fast, continuous, and wide-ranging. In this paper, an interactive perception framework is proposed, which combines the visual perception of AV and information interaction of CVIS. Based on the framework, an interactive perception-based multiple object tracking (IP-MOT) method is presented. IP-MOT can be divided into two parts. First, a Lidaronly multiple object tracking (L-MOT) method obtains the status of surroundings using the voxel cluster algorithm. Second, the preliminary tracking result is fused with the interactive information to generate the trajectories of target vehicles. Two simulation platforms are established to verify the proposed methods: CVIS simulation platform and Virtual Reality (VR) test platform. The L-MOT algorithm is tested on a public dataset and the IP-MOT algorithm is tested on our simulation platform. The results show that the IP-MOT algorithm can improve the accuracy of object tracking as well as expand the vehicle perception range via combination of CVIS and AV.INDEX TERMS Cooperative vehicle and infrastructure system, autonomous vehicle, perception mode, multiple object tracking.