To acquire precise, dependable, and credible trajectory information from extensive motion image datasets, this study introduces a robust mining algorithm grounded in trajectory extraction for voluminous motion data. The algorithm leverages an enhanced single-stage object detection model (TFF-SSD) and employs a 3D sparse convolutional neural network for extracting point cloud features from the extensive motion data. Simultaneously, spatial semantic features are derived by integrating spatial and semantic characteristics of the vast motion data. Subsequently, a mutual attention fusion method is applied to amalgamate point cloud features and spatial semantic features into the Faster R-CNN model. This facilitates the swift identification of moving target regions, leading to the extraction of classification and coordinate information from the abundant data. Employing reference points from diverse types of extensive motion data, the algorithm selects associated feature regions for the motion data target, thereby obtaining reference point information indicative of continuous motion. Subsequent fitting of this reference point information enables the reliable mining of trajectories within the extensive motion data. Experimental outcomes demonstrate the algorithm's ability to accurately detect all crowd movements in a running race. The extracted motion data features are diverse, encompassing reference points that facilitate trajectory mining with elevated quality, reliability, and genuine data content.