Summarization of hierarchical data and metadata is a fundamental operation in applications in many domains. In particular, similarity search of hierarchical data, such as XML, would benefit greatly from concise and indexable summaries. This is especially true in P2P scenarios, where the search needs to be done in a distributed fashion on multiple peers. This situation requires summaries which are small, yet effective in identifying potential peers that need to be further explored. In this paper, we propose a method, called propagation-vectors for trees (PVT) which constructs very concise and accurate summaries of hierarchical data, such as XML trees. We then show how to use this summary to perform similarity search on summarized data. The proposed summarization scheme relies on a label-propagation mechanism, which constructs an n-dimensional vector from a given tree with n unique data labels. Experimental results have shown that the constructed PVT summaries capture the structure of the input trees very accurately, the representations are highly concise, and that the search based on these summaries are faster than the existing approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.