Abstract. During the past decade, a vast number of GPS devices have produced massive amounts of data containing both time and spatial information. This poses a great challenge for traditional spatial databases. With the development of distributed cloud computing, many highperformance cloud platforms have been built, which can be used to process such spatio-temporal data. In this research, to store and process data in an effective and green way, we propose the following solutions: firstly, we build a Hadoop cloud computing platform using Cubieboards2, an ARM development board with A20 processors; secondly, we design two types of indexes for different types of spatio-temporal data at the HDFS level. We use a specific partitioning strategy to divide data in order to ensure load balancing and efficient range query. To improve the efficiency of disk utilisation and network transmission, we also optimise the storage structure. The experimental results show that our cloud platform is highly scalable, and the two types of indexes are effective for spatio-temporal data storage optimisation and they can help achieve high retrieval efficiency.
How to design efficient scheduling strategy for different environments is a hot topic in cloud computing. In the private cloud of computer science labs in universities, there are several kinds of tasks with different resource requirements, constraints, and lifecycles such as IT infrastructure tasks, course design tasks submitted by undergraduate students, deep learning tasks and and so forth. Taking the actual needs of our laboratory as an instance, these tasks are analyzed, and scheduled respectively by different scheduling strategies. The Batch Scheduler is designed to process tasks in rush time to improve system throughput. Dynamic scheduling algorithm is proposed to tackle long-term lifecycle tasks such as deep learning tasks which are hungry for GPU resources and have dynamically changing priorities. Experiments show that the scheduling strategies proposed in this paper improve resource utilization and efficiency.
Reverse skyline query is an extension of the classical skyline query, widely used in the decision support in e-business. The vast burst of big data in e-business challenges the classical algorithms for such queries. This paper provides a novel definition of decision set and a decision set based reverse skyline query method called DRS on the double-layer R tree indexing in a map-reduce manner. Theoretical proofs are provided for the correctness and complexity of the DRS algorithm. Experiments made using several large data sets are presented and analyzed to illustrate the applicability and the outperformance of DRS over the state-of-the-art reverse skyline query methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.