“…Some literature has recently come out about the characterization and identification of structural patterns of text documents. For instance, Tannier, Girardot, and Mathieu (), starting from previous works by Lini, Lombardini, Paoli, Colazzo, and Sartiani () and Colazzo et al. (), describe an algorithm to assign each XML element in a document to one of three different categories: hard tag, soft tag , and jump tag .…”