The increasing expansion of digital data collected from many sources renders traditional storage, processing, and analysis methods obsolete. For these restrictions, new technologies for processing and storing very massive datasets have been developed. Big data processing is required to extract relevant information from it. Transforming data into information and knowledge is what processing implies. Big data processing is the process of dealing with massive amounts of data and changing it from its raw form into useable information in a more understandable manner. As a result, numerous big data processing execution frameworks have emerged, but determining and selecting the appropriate framework for processing your big data applications is a significant challenge. Therefore, this paper investigates the possible influence of big data challenges and discusses in depth the most well-known approaches to big data processing, which are divided into five classes: batch processing, streaming processing, realtime processing, interactive processing, and hybrid processing, as well as the variety of the most popular frameworks associated with them such as Apache Hadood, Dryad, Samza, IBM Infosphere, Storm, Amazon Kinesis, Drill, Impala, Flink, and Spark. Furthermore, this study presents a comparison among the several features of the frameworks by highlighting their drawbacks and strengths. Thus, it can be used as a guideline for picking the best application framework in IT analytics and will help busi-ness users make faster decisions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.