Psychophysical and affective experiences while watching audiovisual content can be antecedents to determine the overall evaluation of viewing experiences. However, the layered relationships among these experiences have been hardly discussed. We investigated these experiences and their semantic structure while watching hug scenes. We selected 38 adjectives describing hugs and classified them into three layers: psychophysical, affective, and overall evaluation. Participants scored each of the 24 videos containing hugging scenes using these adjectives. The structure among the three layers was computed: the four psychophysical factors affected the four affective factors, which in turn affected the two overall evaluation factors, i.e., joyful and reassuring. The model was confirmed to have statistical validity by structural equation modeling and semantic validity by experiments using dummy links. The results will lead to the determination of measures to enhance affective experiences when viewing videos, and the formulation of criteria for measuring and evaluating affective experiences.