Until now the virtual assistants (like Siri, Google Now and Cortana) have primarily been confined to voice input and output only. Is there a justification for voice only confinement or can we enhance the user experience by adding a visual output? We hypothesized that providing a higher level of visual/auditory immersion would enhance the quality of user experience. In order to test this hypothesis, we first developed 4 variants of virtual assistant, each with a different audio/visual level of immersion. Developed virtual assistant systems were the following; audio only, audio and 2D visual display, audio and 3D visual display and audio and immersive 3D visual display. We developed a plan for usability testing of all 4 variants. The usability testing was conducted with 30 subjects against eight (8) dependent variables included presence, involvement, attention, reliability, dependency, easiness, satisfaction and expectations. Each subject rated these dependent variables based on a scale of 1-5, 5 being the highest value. The raw data collected from usability testing was then analyzed through several tools in order to determine the factors contributing towards the quality of experience for each of the 4 variants. The significant factors were then used develop a model that measures the quality of user experience. It was found that each variant had a different set of significant variables. Hence, in order to rate each system there is a need to develop a scale that is dependent upon the unique set of variables for the respective variant. Furthermore, it was found that variant 4 scored the highest rate for Quality of Experience (QoE). Lastly several other qualitative conclusions were also drawn from this research that will guide future work in the field of virtual assistants.