In multi-camera video tracking, the tracking scene and tracking-target appearance can become complex, and current tracking methods use entirely different databases and evaluation criteria. Herein, for the first time to our knowledge, we present a universally applicable template library updating approach for multi-camera human tracking called multi-state self-learning template library updating (RS-TLU), which can be applied in different multi-camera tracking algorithms. In RS-TLU, self-learning divides tracking results into three states, namely steady state, gradually changing state, and suddenly changing state, by using the similarity of objects with historical templates and instantaneous templates because every state requires a different decision strategy. Subsequently, the tracking results for each state are judged and learned with motion and occlusion information. Finally, the correct template is chosen in the robust template library. We investigate the effectiveness of the proposed method using three databases and 42 test videos, and calculate the number of false positives, false matches, and missing tracking targets. Experimental results demonstrate that, in comparison with the state-of-the-art algorithms for 15 complex scenes, our RS-TLU approach effectively improves the number of correct target templates and reduces the number of similar templates and error templates in the template library.