“…DBLP [6] stores bibliographic data in XML format and includes, among others, authors, titles, and venues of computer science publications. Due to its availability and intuitiveness, the DBLP dataset has been used in many works for experimental purposes, e.g., as a collection of sets [44, 45], as a collection of trees [37, 38, 46], as a large hierarchical document [34, 40], and as a coauthor network graph [42, 49]. In this section, we show the impact of differences in the data preparation process that converts raw DBLP XML data into the desired input format.…”