Performance bottlenecks in the optimization of JND modeling based on low-level manual visual feature metrics have emerged. High-level semantics bear a considerable impact on perceptual attention and subjective video quality, yet most existing JND models do not adequately account for this impact. This indicates that there is still much room and potential for performance optimization in semantic feature-based JND models. To address this status quo, this paper investigates the response of visual attention induced by heterogeneous semantic features with an eye on three aspects, i.e., object, context, and cross-object, to further improve the efficiency of JND models. On the object side, this paper first focuses on the main semantic features that affect visual attention, including semantic sensitivity, objective area and shape, and central bias. Following that, the coupling role of heterogeneous visual features with HVS perceptual properties are analyzed and quantified. Second, based on the reciprocity of objects and contexts, the contextual complexity is measured to gauge the inhibitory effect of contexts on visual attention. Third, cross-object interactions are dissected using the principle of bias competition, and a semantic attention model is constructed in conjunction with a model of attentional competition. Finally, to build an improved transform domain JND model, a weighting factor is used by fusing the semantic attention model with the basic spatial attention model. Extensive simulation results validate that the proposed JND profile is highly consistent with HVS and highly competitive among state-of-the-art models.