A data stream exhibits as a massive unbounded sequence of data elements continuously generated at a high rate. Stream databases raise new challenges for query processing due to both the streaming nature of data which constantly changes over time and the wider range of queries submitted by the user when compared with the traditional databases. In this paper, we propose a system architecture which includes components for both distributed indexing of streaming data and distributed processing of range queries over streaming data. By exploiting the proposed system architecture, the process of indexing of streaming data and the process of querying over streaming data can be done in a distributed fashion. We also design a distributed B+Tree indexing method using the map-reduce programming model of the Apache Spark framework which creates small B+Tree indexes on the machines of a Spark cluster instead of using a large and centralized B+Tree index structure. Moreover, we propose a distributed range search algorithm to process range queries in distributed and parallel form using the set of small B+Tree indexes. By performing several experiments, we demonstrate that our proposed distributed B+Tree indexing method is scalable and efficient compared to the existing indexing methods and therefore, it can be used for applications involving data streams with a large volume of data elements and a large number of range queries.
Replica placement is one of the classic issues, which enjoys applicability in finding the optimal location to deploy servers in different fields, particularly, industry in addition to computer network fields. Several methods have been proposed for selecting such locations optimally. The two noteworthy parameters such methods try to optimize are: selection of the best-fit location and the time of executing the algorithm. Consequently, the efficient method will be the one that selects locations as close to the optimal status as possible and enjoys a rather acceptable speed. This is considered an NP-Complete problem, and thus, heuristic methods will be used in its solution. Among the proposed methods to solve the problem of replica placement of server location, the best algorithm in terms of time complexity is the O (N.max(logN,K)). The method which has been introduced in this study is the designation and implementation of an algorithm using the genetic algorithm. The execution time of such an algorithm is much less than the algorithms whose location is close to optimal.
A data stream exhibits as a massive unbounded sequence of data elements continuously generated at a high rate. Stream databases raise new challenges for query processing due to both the streaming nature of data which constantly changes over time and the wider range of queries submitted by the user when compared with the traditional databases. In this paper, we propose a system architecture which includes components for both distributed indexing of streaming data and distributed processing of range queries over streaming data. By exploiting the proposed system architecture, the process of indexing of streaming data and the process of querying over streaming data can be done in a distributed fashion. We also design a distributed B + Tree indexing method using the map-reduce programming model of the Apache Spark framework which creates small B + Tree indexes on the machines of a Spark cluster instead of using a large and centralized B + Tree index structure. Moreover, we propose a distributed range search algorithm to process range queries in distributed and parallel form using the set of small B + Tree indexes. By performing several experiments, we demonstrate that our proposed distributed B + Tree indexing method is scalable and efficient compared to the existing indexing methods and therefore, it can be used for applications involving data streams with a large volume of data elements and a large number of range queries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.