Twenty students assigned to a haptics (experimental) or no-haptics (control) condition performed a "docking" task where users sought the most favourable position between a ligand and protein molecule, while students' interactions with the model were logged. Improvement in students' understanding of biomolecular binding was previously measured by comparing written responses to a target conceptual question before and after interaction with the model. A log-profiling tool visualized students' movement of the ligand molecule during the docking task. Multivariate parallel coordinate analyses explored any relationships in the entire student data set. The haptics group produced a tighter constellation of collected final docked ligand positions in comparison with no-haptics students, coupled to docking profiles that depicted a more fine-tuned ligand traversal. Students in the no-haptics condition employed double the amount of interactive behaviours concerned with switching between different visual chemical representations offered by the model. In the no-haptics group, this visually intense processing was synonymous with erroneously 'fitting' the ligand closer distances to the protein surface. Students who showed higher learning gains tended to engage fewer visual representational switches, and were from the haptics group, while students with a higher spatial ability also engaged fewer visual representational switches, irrespective of assigned condition. From an information-processing standpoint, visual and haptic coordination may offload the visual pathway by placing less strain on visual working memory. From an embodied cognition perspective, visual and tactile sensorimotor interactions in the macroworld may provide access to constructing knowledge about submicroscopic phenomena. The results have cognitive and practical implications for the use of multimodal virtual reality technologies in educational contexts.