“…This is in contrast to HTML-aware systems, which exploit the tree-structure of HTML explicitly. This began with some interactive programming approaches where the user provided various structural constraints [4,27,38], and since then there has been greater focus on learning wrappers from examples in standard HTML query languages such as XPath or CSS [2, 10, 28-30, 32, 42], which has also been our focus in this work. XPath alignment approaches [28,29] work by aligning and merging the steps within the XPaths of sample nodes based on edit distances, while least general generalization methods [32] produce largest conjunctions of all common node attributes.…”