Purpose – The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big data environment. Design/methodology/approach – First, the integration of Apache Hive, Cloudera Impala and BDAS Shark make the platform support SQL-like query. Next, users can access a single interface and select the best performance of big data warehouse platform automatically by the proposed optimizer. Finally, the distributed memory storage system Memcached incorporated into the distributed file system, Apache HDFS, is employed for fast caching query results. Therefore, if users query the same SQL command, the same result responds rapidly from the cache system instead of suffering the repeated searches in a big data warehouse and taking a longer time to retrieve. Findings – As a result the proposed approach significantly improves the overall performance and dramatically reduces the search time as querying a database, especially applying for the high-repeatable SQL commands under multi-user mode. Research limitations/implications – Currently, Shark’s latest stable version 0.9.1 does not support the latest versions of Spark and Hive. In addition, this series of software only supports Oracle JDK7. Using Oracle JDK8 or Open JDK will cause serious errors, and some software will be unable to run. Practical implications – The problem with this system is that some blocks are missing when too many blocks are stored in one result (about 100,000 records). Another problem is that the sequential writing into In-memory cache wastes time. Originality/value – When the remaining memory capacity is 2 GB or less on each server, Impala and Shark will have a lot of page swapping, causing extremely low performance. When the data scale is larger, it may cause the JVM I/O exception and make the program crash. However, when the remaining memory capacity is sufficient, Shark is faster than Hive and Impala. Impala’s consumption of memory resources is between those of Shark and Hive. This amount of remaining memory is sufficient for Impala’s maximum performance. In this study, each server allocates 20 GB of memory for cluster computing and sets the amount of remaining memory as Level 1: 3 percent (0.6 GB), Level 2: 15 percent (3 GB) and Level 3: 75 percent (15 GB) as the critical points. The program automatically selects Hive when memory is less than 15 percent, Impala at 15 to 75 percent and Shark at more than 75 percent.
This paper introduces a high-performed high-availability in-cloud enterprise resources planning (in-cloud ERP) which has deployed in the virtual machine cluster. The proposed approach can resolve the crucial problems of ERP failure due to unexpected downtime and failover between physical hosts in enterprises, causing operation termination and hence data loss. Besides, the proposed one together with the access control authentication and network security is capable of preventing intrusion hacked and/or malicious attack via internet. Regarding system assessment, cost-performance (C-P) ratio, a remarkable cost effectiveness evaluation, has been applied to several remarkable ERP systems. As a result, C-P ratio evaluated from the experiments shows that the proposed approach outperforms two well-known benchmark ERP systems, namely, in-house ECC 6.0 and in-cloud ByDesign.
The service-oriented hosts in enterprises like ERP system have always encountered the crucial problem of unexpected down-time or system failure that will cause data loss and system termination. Failover is a challenge issue that it cannot be done successfully between physical hosts. Therefore this paper introduces high availability architecture to build an in-cloud Enterprise Resources Planning together with access control authentication deployed in the virtual machine cluster, which can resolve the problems mentioned above. Access control authentication has been designed in the cloud computing system to prevent the service-oriented hosts form external fraud or intrusion. As a result, the approach proposed in this paper outperforms two well-known benchmark ERP systems, in-house ECC 6.0 and in-cloud ByDesign.
The service-oriented hosts in enterprises like enterprise resources planning (ERP) system have always encountered the crucial problem of unexpected down-time or system failure that will cause data loss and system termination. Failover is a challenge issue that cannot be done successfully between physical hosts. Traditional information security using demilitarized zone approach costs a lot. Therefore, this paper introduces in-cloud enterprise resources planning (in-cloud ERP) deployed in the virtual machine cluster together with access control authentication and network security which can resolve the three problems mentioned above. Access control authentication and network security have been implemented in the cloud computing system to prevent the service-oriented hosts form external fraud, intrusion, or malicious attacks. As a result of the experiments the number of accessing in-cloud ERP is 5.2 times as many as in-house ERP. The total expenditure of in-cloud ERP has decreased significantly to 48.4 % the cost of in-house ERP. In terms of operational speed, the approach proposed in this paper outperforms two well-known benchmark ERP systems, in-house ECC 6.0 and in-cloud ByDesign.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.