We evaluate QoE of multiview video and selectable audio (MVV-SA), in which users can switch not only video but also audio according to a viewpoint change request, transmitted over IP networks by a subjective experiment. The evaluation is performed by the semantic differential (SD) method with 13 adjective pairs. In the subjective experiment, we ask assessors to evaluate 40 stimuli which consist of two kinds of UDP load traffic, two kinds of fixed additional delay, five kinds of playout buffering time, and selectable or unselectable audio (i.e., MVV-SA or the previous MVV-A). As a result, MVV-SA gives higher presence to the user than MVV-A and then enhances QoE. In addition, we employ factor analysis for subjective assessment results to clarify the component factors of QoE. We then find that three major factors affect QoE in MVV-SA.