Abstract.A new data structure set-trie for storing and retrieving sets is proposed. Efficient manipulation of sets is vital in a number of systems including datamining tools, object-relational database systems, and rule-based expert systems. Data structure set-trie provides efficient algorithms for set containment operations. It allows fast access to subsets and supersets of a given parameter set. The performance of operations is analyzed empirically in a series of experiments on real-world and artificial datasets. The analysis shows that sets can be accessed in O(c * |set|) time where |set| represents the size of parameter set and c is a constant.
Discovery of multivalued dependencies from database relations is viewed as a search in a hypothesis space de ned according to the generalisation relationship among multivalued dependencies. Two algorithms for the discovery of multivalued dependencies from relations are presented. The topdown algorithm enumerates the hypotheses from the most general to more speci c hypotheses which are checked on the input relation. The bottomup algorithm rst computes the invalid multivalued dependencies. Starting with the most general dependencies, the algorithm iteratively re nes the set of dependencies to conform with each particular invalid dependency. The implementation of the algorithms is analysed and some empirical results are presented.
Set containment operations form an important tool in various fields such as information retrieval, AI systems, object-relational databases, and Internet applications. In the paper, a set-trie data structure for storing sets is considered, along with the efficient algorithms for the corresponding set containment operations. We present the mathematical and empirical study of the set-trie. In the mathematical study, the relevant upper-bounds on the efficiency of its expected performance are established by utilizing a natural probabilistic model. In the empirical study, we give insight into how different distributions of input data impact the efficiency of set-trie. Using the correct parameters for those randomly generated datasets, we expose the key sources of the input sensitivity of set-trie. Finally, the empirical comparison of set-trie with the inverted index is based on the real-world datasets containing sets of low cardinality. The comparison shows that the running time of set-trie consistently outperforms the inverted index by orders of magnitude.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.