IoT technologies enable millions of devices to transmit their sensor data to the external world. The device–object pairing problem arises when a group of Internet of Things is concurrently tracked by cameras and sensors. While cameras view these things as visual “objects”, these things which are equipped with “sensing devices” also continuously report their status. The challenge is that when visualizing these things on videos, their status needs to be placed properly on the screen. This requires correctly pairing visual objects with their sensing devices. There are many real-life examples. Recognizing a vehicle in videos does not imply that we can read its pedometer and fuel meter inside. Recognizing a pet on screen does not mean that we can correctly read its necklace data. In more critical ICU environments, visualizing all patients and showing their physiological signals on screen would greatly relieve nurses’ burdens. The barrier behind this is that the camera may see an object but not be able to see its carried device, not to mention its sensor readings. This paper addresses the device–object pairing problem and presents a multi-camera, multi-IoT device system that enables visualizing a group of people together with their wearable devices’ data and demonstrating the ability to recover the missing bounding box.