In recent years, cross-situational word learning (CSWL) paradigms have shown that novel words can be learned through implicit statistical learning. So far, CSWL studies using adult populations have focused on the presentation of spoken words (auditory information), however, words can also be learned through their written form (orthographic information). This study compares auditory and orthographic presentation of novel words with different degrees of phonological overlap using the CSWL paradigm. Additionally, we also present a lab-based and online-based approach to testing behavioural experiments. Due to the COVID-19 pandemic, lab testing was prematurely terminated, and testing was continued online using a newly created online testing protocol. Analyses first compared accuracy and response times across modalities, with our findings showing better and faster recognition performance for CSWL when novel words are presented through their written (orthographic condition) than through their spoken forms (auditory condition). As well, Bayesian modelling found that accuracy for the auditory condition was higher online compared to the lab-based experiment, whereas performance in the orthography condition was high in both experiments and generally outperformed the auditory condition. We discuss the implications of our findings for modality of presentation, as well as the benefits of our online testing protocol and its implementation for future research.