The database summarization system coined SaintEtiQ provides multi-resolution summaries of structured data stored into a centralized database. Summaries are computed online with a conceptual hierarchical clustering algorithm. However, most companies work in distributed legacy environments and consequently the current centralized version of SaintEtiQ is either not feasible (privacy preserving) or not desirable (resource limitations).To address this problem, we propose new algorithms to generate a single summary hierarchy given two distinct hierarchies, without scanning the raw data. The Greedy Merging Algorithm (GMA) takes all leaves of both hierarchies and generates the optimal partitioning for the considered data set with regards to a cost function (compactness and separation). Then, a hierarchical organization of summaries is built by agglomerating or dividing clusters such that the cost function may emphasize local or global patterns in the data. Thus, we obtain two different hierarchies according to the performed optimisation. However, this approach breaks down due to its exponential time complexity.Two alternative approaches with constant time complexity w.r.t. the number of data items, are proposed to tackle this problem. The first one, called Merge by Incorporation Algorithm (MIA), relies on the SaintEtiQ engine whereas the second approach, named Merge by Alignment Algorithm (MAA), consists in rearranging summaries by levels in a top-down manner.Then, we compare those approaches using an original quality measure in order to quantify how good our merged hierarchies are. Finally, an experimental study, using real data sets, shows that merging processes (MIA and MAA) are efficient in terms of computational time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.