Big data is a large collection of dataset from heterogeneous sources of data which may be terabytes or petabytes of data. The big data is useful for existing business growth and also supports to create the new business. Handling this much of data is very difficult in database management system. The problems of big data are storing, processing, analyzing, extracting, and privacy. This survey paper, mainly focused on challenges of big data, how to extract the required data from large volume of data, and also various clustering algorithm. For the extraction of data, mapreduce function is used which is mainly used in Google search engine.
IntroductionToday data are grown tremendously by social networking such as Facebook, Twitter, mobile devices [1]. Everyday, 2.5 quintillion bytes of heterogeneous data are generated that data is considered as a big data so handling this large amount of data is very challenging issues for users [2]. The main challenges are system capabilities, algorithm design to extract the required data, online processing, security and privacy, business model [3]. Big data is a heterogeneous and decentralized data which create an extreme challenge to discover the required information from it [4]. The data are divided into two parts [5]. They are structured data such as data are available in the form of rows and columns and unstructured data such as data in the form of document, pdf, text, images, video, and exe file so on [6].Big data management is characterized by 5 V's such as volume, variety, velocity, variability, and value which is demonstrated in Fig. 1. Volume indicates the size of