This study proposes a web-based real-time remote-controlled virtual puppetry (avatar) performance platform called AvatarTalk to combine traditional Taiwanese puppetry, the Internet of Things (IoT), and 3D avatar models. AvatarTalk enables puppeteers to control avatar puppets on the web using either motion capture gloves or camera-based image recognition. This transformative performance approach not only facilitates remote performances and multi-screen presentations across various devices but also introduces a fresh gesture interpretation technique for dance artists. AvatarTalk supports a mechanism that can accommodate new control and puppet devices from other approaches through the IoT-based microservice concept. We develop a calibration procedure to enable the accurate capture of hand gestures to manipulate the movements of virtual puppets, empowering them to perform fundamental traditional puppetry poses such as nodding, bowing, and synchronized hand movements. We have conducted experiments to show the accuracy of AvatarTalk calibration. Our study indicates that AvatarTalk can almost detect the right gestures (98.75%-100% recall) and very seldom mistake the wrong gestures (90.8%-100% precision). Additionally, we also provide the mechanism to measure the delays of controlling the puppets. An analytic model is proposed to design the delay times of the messages between the control device of a puppeteer and the AvatarTalk server. In the current AvatarTalk implementation, if the message delay does not exceed 0.1 seconds, four puppeteers can synchronize their actions if the elapsed time between two actions of a puppeteer is longer than 0.3 second.