With sixth generation (6G) communication technologies, target sensing can be finished in milliseconds. The mobile tracking-oriented Internet of Things (MTT-IoT) as a kind of emerging application network can detect sensor nodes and track targets within their sensing ranges cooperatively. Nevertheless, huge data processing and low latency demands put tremendous pressure on the conventional architecture where sensing data is executed in the remote cloud and the short transmission distance of 6G channels presents new challenges into the design of network topology. To cope with the above difficulties, this paper proposes a new resource allocation scheme to perform delicate node scheduling and accurate tracking in multitarget tracking mobile networks. The dynamic tracking problem is formulated as an infinite horizon Markov Decision Process (MDP), where the state space that considers energy consumption, system responding delay, and target important degree is extended. A model-free reinforcement learning is applied to obtain satisfied tracking actions by frequent iterations, in which smart agents interact with the complicated environment directly. The performance of each episode is evaluated by the action-value function in search of the optimal reward. Simulation results demonstrate that the proposed scheme shows excellent tracking performance in terms of energy cost and tracking delay.