15 years of Big Data: a systematic literature review

The article is dedicated to exploring data analytics approaches within the context of business digital transformation. The role of data in enhancing the efficiency of enterprises is highlighted, as data facilitates informed managerial decision–making and strengthens competitive advantages. It is emphasized that the effective use of data requires not only advanced infrastructure and personnel competencies but also a systematic approach that integrates various methods of data analysis and justifies the transition to specific business analytics strategies. A comparative analysis of three primary data analytics approaches – analog, digital, and big data – is conducted. A data analytics model is presented, reflecting the evolution and synergetic integration of these approaches. The study identifies a unified orientation in data analytics, aimed at improving business efficiency through the application and value of business analytics. The distinct characteristics of the transition from analog to digital approaches and from digital to big data analytics are determined, with a focus on changes in infrastructure, personnel competency requirements, and the applicability of data analysis methods. The study concludes with the recognition of the necessity of applying synergetic and systematic approaches to data analytics in the context of digital transformation, which contributes to maximizing the effectiveness of business analytics.

show abstract

Data-Centric Solutions for Addressing Big Data Veracity with Class Imbalance, High Dimensionality, and Class Overlapping

Bolívar,

García,

Alejo

et al. 2024

Applied Sciences

View full text Add to dashboard Cite

An innovative strategy for organizations to obtain value from their large datasets, allowing them to guide future strategic actions and improve their initiatives, is the use of machine learning algorithms. This has led to a growing and rapid application of various machine learning algorithms with a predominant focus on building and improving the performance of these models. However, this data-centric approach ignores the fact that data quality is crucial for building robust and accurate models. Several dataset issues, such as class imbalance, high dimensionality, and class overlapping, affect data quality, introducing bias to machine learning models. Therefore, adopting a data-centric approach is essential to constructing better datasets and producing effective models. Besides data issues, Big Data imposes new challenges, such as the scalability of algorithms. This paper proposes a scalable hybrid approach to jointly addressing class imbalance, high dimensionality, and class overlapping in Big Data domains. The proposal is based on well-known data-level solutions whose main operation is calculating the nearest neighbor using the Euclidean distance as a similarity metric. However, these strategies may lose their effectiveness on datasets with high dimensionality. Hence, the data quality is achieved by combining a data transformation approach using fractional norms and SMOTE to obtain a balanced and reduced dataset. Experiments carried out on nine two-class imbalanced and high-dimensional large datasets showed that our scalable methodology implemented in Spark outperforms the traditional approach.

show abstract

15 years of Big Data: a systematic literature review

Cited by 4 publications

References 195 publications

A Performance Evaluation Study on a Data Analytics Platform for Emergency Calls

A Performance Evaluation Study on a Data Analytics Platform for Emergency Calls

Comparative Analysis of Data Analytics Approaches in the Context of Business Digital Transformation

Data-Centric Solutions for Addressing Big Data Veracity with Class Imbalance, High Dimensionality, and Class Overlapping

Contact Info

Product

Resources

About