The concept of big data security is introduced in this article along with many features. It illustrates the need for security in healthcare systems as the volume of data increases continuously over the period of time. The necessity of big data security as well as several big data analytics phases highlighted. It covers many big data privacy-preserving strategies. Many digital storage solutions being used in today's world are designed to work only with fixed format of the data. This paper introduces some methods for maintaining metadata obliviousness. The oblivious RAM technology mentioned in the research article address security concerns and it can be handled with the daily increase in data in several industries. Security needs are introduced at many phases of big data creation, such as information extraction, storage systems, and analytics of the information. Additionally, it presents several data recovery methods for recovering original data in the event of a data crash. This paper covers several data categorization methods for sorting data into normal and sensitive categories as well as methods for anomaly detection. It discusses the advantages and disadvantages of various security measures.INDEX TERMS Security, privacy, obliviousness, data recovery.
I. INTRODUCTIONThe core concepts and phases of big data are introduced in this section. At several stages of big data production, such as data generation, data storage, and data analytics, security requirements are introduced. The importance of security in big data, cloud, and Internet of Things (IoT) infrastructure is discussed in this section. It discusses about several security measures that must be defended against various kinds of attacks. It discusses on the security of healthcare systems. It gives a general overview of the medical industry and discusses how patient data must be safeguarded if it is stored in a dispersed setting.
A. BIG DATABig data is data that has large volume, heterogeneity, speed, and volatility all at the same time. The term "high volume" denotes to the significant quantity of data that is produced on a daily basis by a variety of companies and organizations. The rapid proliferation of new data types, many of which are going to be incorporated in current datasets that have been acquired from a variety of companies and devices, is referred to as heterogeneity. The term "high speed" refers to the rapid rate at which data is collected or captured from a variety of social networking sites into a database [1]. Huge amounts of data present additional difficulties to the security systems that are already in place due to the variety, volume, and unpredictability of the data. Most data storage models used today are designed to work with data that has been properly arranged. The currently available encryption methods are inefficient for use with massive data as once data has been encrypted, producing keys, encrypting it, and then decrypting it takes a significant amount of time. For instance, the data collected from patients, various sensors, soc...