Big Data has proved to be vast and complex, without being efficiently manageable through traditional architectures, whereas data analysis is considered crucial for both technical and non-technical stakeholders. Current analytics platforms are siloed for specific domains, whereas the requirements to enhance their use and lower their technicalities are continuously increasing. This paper describes a domain-agnostic single access autoscaling Big Data analytics platform, namely Diastema, as a collection of efficient and scalable components, offering user-friendly analytics through graph data modelling, supporting technical and non-technical stakeholders. Diastema’s applicability is evaluated in healthcare through a predicting classifier for a COVID19 dataset, considering real-world constraints.
Big Data is a phenomenon that affects today’s world, with new data being generated every second. Today’s enterprises face major challenges from the increasingly diverse data, as well as from indexing, searching, and analyzing such enormous amounts of data. In this context, several frameworks and libraries for processing and analyzing Big Data exist. Among those frameworks Hadoop MapReduce, Mahout, Spark, and MLlib appear to be the most popular, although it is unclear which of them best suits and performs in various data processing and analysis scenarios. This paper proposes EverAnalyzer, a self-adjustable Big Data management platform built to fill this gap by exploiting all of these frameworks. The platform is able to collect data both in a streaming and in a batch manner, utilizing the metadata obtained from its users’ processing and analytical processes applied to the collected data. Based on this metadata, the platform recommends the optimum framework for the data processing/analytical activities that the users aim to execute. To verify the platform’s efficiency, numerous experiments were carried out using 30 diverse datasets related to various diseases. The results revealed that EverAnalyzer correctly suggested the optimum framework in 80% of the cases, indicating that the platform made the best selections in the majority of the experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.