A traditional cultural text detection model is constructed using a full convolutional net model and a feature fusion method in this paper. Bilinear interpolation calculates the aspect ratio between the original and target features maps. Combined with transpose convolution, the feature map is reduced to pixel space to extract traditional cultural features. A residual network is used to achieve constant mapping, which improves the network’s overall performance and increases its depth. The mapping coordinates of the input image are obtained by calculating anchor box positive samples using text box truth values. Combined with the depth search algorithm, all segments of the same text are detected, articulated, and combined. The results show that combining traditional culture and Civic Education is beneficial to improve students’ overall ability, with 65% improvement in grades and 55% improvement in ideology.