The use of big data in various fields has led to a rapid increase in a wide variety of data resources, and various data analysis technologies such as standardized data mining and statistical analysis techniques are accelerating the continuous expansion of the big data market. An important characteristic of big data is that data from various sources have life cycles from collection to destruction, and new information can be derived through analysis, combination, and utilization. However, each phase of the life cycle presents data security and reliability issues, making the protection of personally identifiable information a critical objective. In particular, user tendencies can be analyzed using various big data analytics, and this information leads to the invasion of personal privacy. Therefore, this paper identifies threats and security issues that occur in the life cycle of big data by confirming the current standards developed by international standardization organizations and analyzing related studies. In addition, we divide a big data life cycle into five phases (i.e., collection, storage, analytics, utilization, and destruction), and define the security taxonomy of the big data life cycle based on the identified threats and security issues.
Background With the recent use of IT in health care, a variety of eHealth data are increasingly being collected and stored by national health agencies. As these eHealth data can advance the modern health care system and make it smarter, many researchers want to use these data in their studies. However, using eHealth data brings about privacy and security concerns. The analytical environment that supports health care research must also consider many requirements. For these reasons, countries generally provide research platforms for health care, but some data providers (eg, patients) are still concerned about the security and privacy of their eHealth data. Thus, a more secure platform for health care research that guarantees the utility of eHealth data while focusing on its security and privacy is needed. Objective This study aims to implement a research platform for health care called the health care big data platform (HBDP), which is more secure than previous health care research platforms. The HBDP uses attribute-based encryption to achieve fine-grained access control and encryption of stored eHealth data in an open environment. Moreover, in the HBDP, platform administrators can perform the appropriate follow-up (eg, block illegal users) and monitoring through a private blockchain. In other words, the HBDP supports accountability in access control. Methods We first identified potential security threats in the health care domain. We then defined the security requirements to minimize the identified threats. In particular, the requirements were defined based on the security solutions used in existing health care research platforms. We then proposed the HBDP, which meets defined security requirements (ie, access control, encryption of stored eHealth data, and accountability). Finally, we implemented the HBDP to prove its feasibility. Results This study carried out case studies for illegal user detection via the implemented HBDP based on specific scenarios related to the threats. As a result, the platform detected illegal users appropriately via the security agent. Furthermore, in the empirical evaluation of massive data encryption (eg, 100,000 rows with 3 sensitive columns within 46 columns) for column-level encryption, full encryption after column-level encryption, and full decryption including column-level decryption, our approach achieved approximately 3 minutes, 1 minute, and 9 minutes, respectively. In the blockchain, average latencies and throughputs in 1Org with 2Peers reached approximately 18 seconds and 49 transactions per second (TPS) in read mode and approximately 4 seconds and 120 TPS in write mode in 300 TPS. Conclusions The HBDP enables fine-grained access control and secure storage of eHealth data via attribute-based encryption cryptography. It also provides nonrepudiation and accountability through the blockchain. Therefore, we consider that our proposal provides a sufficiently secure environment for the use of eHealth data in health care research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.