Most of the studies establishing factors affecting digital text and multimedia comprehension have been conducted in controlled conditions. The present study sought to test and extend the modality and seductive details effects, and the role of verbal ability and working memory capacity, to a remote, self‐paced, E‐learning scenario. Two hundred and thirteen first‐year undergraduates read or watched videos about scientific expository content in three formats: digital text (written expository texts, navigated in seven screens), presentation video (audio explanation, with written keywords), and presentation video with dynamic decorative images (audio explanation, written keywords, and dynamic decorative and irrelevant images). In a face‐to‐face session, they completed working memory and verbal ability tests. Comprehension performance was similar for the three conditions. For the multimedia videos with dynamic decorative irrelevant images, comprehension depended on working memory capacity. Verbal ability was relevant for both expository text and videos.