“…In the literature, there are many proposals to extract data from HTML documents in general, not specifically tables (Ferrara, de Meo, Fiumara, & Baumgartner, 2014;Sleiman & Corchuelo, 2013a). They rely on text alignment (Sleiman & Corchuelo, 2013b), neural networks (Sleiman & Corchuelo, 2014), learning first-order rules (Jiménez & Corchuelo, 2016a), inferring propositiorelational rules (Jiménez & Corchuelo, 2016b), learning decision trees (Uzun, Agun, & Yerlikaya, 2013), embedding graphs (Jiménez, Roldán, Gallego, & Corchuelo, 2020), or using n-grams and rendering information (Figueiredo, Assis, & Ferreira, 2017), to mention a few. Unfortunately, they do not seem to be appropriate to extract the underlying relationships between the cells in HTML tables (Cafarella et al, 2018), which motivated much work on table-understanding (Roldán et al, 2020;Zhang & Balog, 2020).…”