Target tracking applications in wireless sensor networks need to achieve energy efficiency, tracking accuracy, and certain real-time constraints in response to fast-moving targets. From a layer view, an energy-efficient cross-layer communication protocol that consists of a medium access control layer and network routing layer is necessary for joint optimization. Due to the interference and contention over the wireless medium, the limited resources of battery-operated sensor nodes, and the dynamic topology of large-scale networks, this cross-layer design becomes a challenging task. In this research, we exploit a cluster routing algorithm over large-scale networks and propose a low-duty-cycle medium access control (MAC) algorithm to reduce collision, idle-listening, and overhearing. In addition, our work focuses on the joint optimization of routing and a MAC strategy for achieving a good trade-off between low delay, energy efficiency, and tracking accuracy. To deploy this protocol in a real tracking application, we also propose a clustering synchronization procedure that does not require distributing the global timing information over the complete network to achieve network-wide time synchronization. An analytical model and extensive simulations are proposed to evaluate and compare the performance of our work with existing protocols. Simulation and analysis results show that our approach achieves better communication delay and thus better tracking error while maintaining reasonable energy consumption compared to other cases.