“…These measures generally rely on surface-level characteristics of text, such as characters, syllables and word counts (missing citation). While these measures have been widely used in studies investigating the understandability of health content retrieved by search engines (e.g., [4,5,6,7,8,9,18,21]), our preliminary work found that these measures are heavily affected by the methods used to extract text from the HTML source [13]. We were able to identify specific settings of an HTML preprocessing pipeline that provided consistent estimates, but due to the lack of human assessments, we were not able to investigate how well each HTML preprocessing pipeline correlated with human assessments.…”