2023
DOI: 10.3390/s23020564
|View full text |Cite
|
Sign up to set email alerts
|

Smart Data Placement Using Storage-as-a-Service Model for Big Data Pipelines

Abstract: Big data pipelines are developed to process data characterized by one or more of the three big data features, commonly known as the three Vs (volume, velocity, and variety), through a series of steps (e.g., extract, transform, and move), making the ground work for the use of advanced analytics and ML/AI techniques. Computing continuum (i.e., cloud/fog/edge) allows access to virtually infinite amount of resources, where data pipelines could be executed at scale; however, the implementation of data pipelines on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 48 publications
0
8
0
Order By: Relevance
“…In addressing the complexities of multi-cloud optimization, the study conducted by Quddus et al [15] presents a comprehensive framework aimed at minimizing operational costs within online social networks, leveraging dynamic data replication and strategic placement algorithms within a dual-level multi-cloud infrastructure. This research meticulously examines the multifaceted challenges inherent in multi-cloud environments, including vendor lock-in, service availability discrepancies, suboptimal cost utilization, and the exacerbation of latency during data replication processes across diverse cloud service platforms.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In addressing the complexities of multi-cloud optimization, the study conducted by Quddus et al [15] presents a comprehensive framework aimed at minimizing operational costs within online social networks, leveraging dynamic data replication and strategic placement algorithms within a dual-level multi-cloud infrastructure. This research meticulously examines the multifaceted challenges inherent in multi-cloud environments, including vendor lock-in, service availability discrepancies, suboptimal cost utilization, and the exacerbation of latency during data replication processes across diverse cloud service platforms.…”
Section: Related Workmentioning
confidence: 99%
“…The paper [15] explores a novel approach to optimize cloud storage costs by utilizing a rule-based classification system. This system classifies data across four distinct storage tiers-premium, hot, cold, and archive-based on factors such as access frequency, size, and age of data.…”
Section: Related Workmentioning
confidence: 99%
“…For this very evaluation model, four different parameters are selected in addition to the user weights. These parameters are as follows: cost (i.e., based on storage, bandwidth, and READ and WRITE operations -see [3]), proximity (i.e., using IP ranges provided by the cloud service providers and GeoIP), network performance (i.e., throughput), the impact of server-side encryption (i.e., performance). Figure 3 shows the evaluation matrix for the proposed method.…”
Section: Integration Of Data Pipelines and Staasmentioning
confidence: 99%
“…constraints. To this end, we first propose an approach to realize big data pipelines with hybrid infrastructure, i.e., computation on an on-premise server or on a specific cloud, but integration with StaaS; and secondly develop a ranking method to find the most suitable storage facility dynamically based on the user's requirements, including cost, proximity, network performance, impact of server-side encryption, and user weights [3].…”
Section: Introductionmentioning
confidence: 99%
“…Big Data Pipeline is a series of steps and tools used to collect, transform and input data from various sources into a system that can be accessed and processed [4]. In the scope of performance on the MSME e-commerce platform, Big Data Pipeline plays a role in overcoming obstacles in organizing large data, converting it into a more structured format, and storing it in an appropriate system [5]. So that it can make more targeted decisions and based on accurate data analysis.…”
Section: Introductionmentioning
confidence: 99%