“…The problem of knowing what to collect from the web has also been treated in the digital library research community as a focused crawling problem. In focused crawling the goal is to collect content about particular topics (Risse et al, 2012), events (Klein, Balakireva, & Van de Sompel, 2018;Yang, Chitturi, Wilson, Magdy, & Fox, 2012 ), or to collect content that has a particular characteristic such as popularity (Page, Brin, Motwani, & Winograd, 1999), importance Baeza-Yates, Marin, Castillo, & Rodriguez (2005)] or social engagement (Gossen, Demidova, & Risse, 2015 ;Milligan, Ruest, & Lin, 2016;Nwala, Weigle, & Nelson, 2018 ). Generally speaking these approaches take the focus to be a topic, event, person, organization that can be qualified by the types of media (documents, audio, video).…”