Radio Frequency Identification (RFID) systems are widely used in various applications such as supply chain management, inventory control, and object tracking. Identifying RFID tags in a given tag population is the most fundamental operation in RFID systems. While the Tree Walking (TW) protocol has become the industrial standard for identifying RFID tags, little is known about the mathematical nature of this protocol and only some ad-hoc heuristics exist for optimizing it. In this paper, first, we analytically model the TW protocol, and then using that model, propose the Tree Hopping (TH) protocol that optimizes TW both theoretically and practically. The key novelty of TH is to formulate tag identification as an optimization problem and find the optimal solution that ensures the minimal average number of queries. With this solid theoretical underpinning, for different tag population sizes ranging from 100 to 100K tags, TH significantly outperforms the best prior tag identification protocols on the metrics of the total number of queries per tag, the total identification time per tag, and the average number of responses per tag by an average of 50%, 10%, and 30%, respectively, when tag IDs are uniformly distributed in the ID space, and of 26%, 37%, and 26%, respectively, when tag IDs are non-uniformly distributed.