In this paper, heuristic query-driven clustering-based vertical fragmentation technique is efficiently developed. The intrinsic idea is to meet the idealistic case of DDBS design which comes to happen as each query attune “closely match” its relevant fragment. The proposed technique is mainly sought to breed clusters of queries in the first place. Consequently, these clusters would be tacitly used to generate intended disjoint fragments. Moreover, the allocation process has been considered so that replicated and non-replicated scenarios of data are applied. This technique basically meant to be efficaciously applicable at the initial stage of DDBS design without the need for data statistics or empirical results, in either dynamic or static DDBS environment. Many existing design-related techniques are being incorporated to make a promising work, particularly as communication costs being the foremost design objective. Throughout this work, the experimental results and internal evaluation are extensively illustrated to demonstrate the effectiveness and validity of proposed technique.
Similarity measures have long been utilized in information retrieval and machine learning domains for multi-purposes including text retrieval, text clustering, text summarization, plagiarism detection, and several other text-processing applications. However, the problem with these measures is that, until recently, there has never been one single measure recorded to be highly effective and efficient at the same time. Thus, the quest for an efficient and effective similarity measure is still an open-ended challenge. This study, in consequence, introduces a new highly-effective and time-efficient similarity measure for text clustering and classification. Furthermore, the study aims to provide a comprehensive scrutinization for seven of the most widely used similarity measures, mainly concerning their effectiveness and efficiency. Using the K-nearest neighbor algorithm (KNN) for classification, the K-means algorithm for clustering, and the bag of word (BoW) model for feature selection, all similarity measures are carefully examined in detail. The experimental evaluation has been made on two of the most popular datasets, namely, Reuters-21 and Web-KB. The obtained results confirm that the proposed set theory-based similarity measure (STB-SM), as a pre-eminent measure, outweighs all state-of-art measures significantly with regards to both effectiveness and efficiency.
In Distributed Database Systems (DDBS), communication costs and response time have long been open-ended challenges. Nevertheless, when DDBS is carefully designed, the desired reduction in communication costs will be achieved. Data fragmentation (data clustering) and data allocation are on popularity as the prime strategies in constant use to design DDBS. Based on these strategies, on the other hand, several design techniques have been presented in the literature to improve DDBS performance using either empirical results or data statistics, making most of them imperfect or invalid particularly, at least, at the initial stage of DDBSs design. In this paper, thus, a heuristic k-means approach for vertical fragmentation and allocation is introduced. This approach is primarily focused on DDBS design at the initial stage. Many techniques are being joined in a step to make a promising work. A brief yet effective experimental study, on both artificially-created and real datasets, has been conducted to demonstrate the optimality of the proposed approach, comparing with its counterparts, as the obtained results has been shown encouraging.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.