“…Similarly, PDF parsing/extraction techniques were applied in 65% (n=15) of studies, the remaining studies applied extraction to other document formats (e.g., journal articles available online in HTML format; see . While similar methods, which additionally take into account syntactic structure, including chunking and dependency parsing were less frequently applied (Angrosh et al, 2014;Li et al, 2022;Nayak et al, 2021;Pertsas & Constantopoulos, 2018). Tagging methods, including PoS tagging (assigning grammatical categories, e.g., noun, verb), followed by concept tagging (e.g., semantic annotation), or sequence tagging, where labels were assigned based on order of appearance, were used in 43% (n=15) of studies.…”