Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data 2013
DOI: 10.1145/2463676.2465273
|View full text |Cite
|
Sign up to set email alerts
|

Cumulon

Abstract: We present Cumulon, a system designed to help users rapidly develop and intelligently deploy matrix-based big-data analysis programs in the cloud. Cumulon features a flexible execution model and new operators especially suited for such workloads. We show how to implement Cumulon on top of Hadoop/HDFS while avoiding limitations of MapReduce, and demonstrate Cumulon's performance advantages over existing Hadoop-based systems for statistical data analysis. To support intelligent deployment in the cloud according … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 58 publications
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…Therefore, we introduce resource time (in Sect. 3) as one of our metrics, representing both the resource utilization and resource cost [25]. It is a helpful indicator for renting pay-as-you-go services and can assist in making cost-effective decisions on vcore and memory configuration.…”
Section: Discussion Conclusion and Future Workmentioning
confidence: 99%
“…Therefore, we introduce resource time (in Sect. 3) as one of our metrics, representing both the resource utilization and resource cost [25]. It is a helpful indicator for renting pay-as-you-go services and can assist in making cost-effective decisions on vcore and memory configuration.…”
Section: Discussion Conclusion and Future Workmentioning
confidence: 99%
“…Spark DataFrame Profile 4 generates statistics (e.g., descriptive statistics, quantiles, histogram) from Spark DataFrames. Cumulon [47] is an end-to-end system to optimize the cost of calculating statistics on the cloud. Sketch [48] is a system for aggregation on distributed data sets.…”
Section: Big Data Exploration and Profilingmentioning
confidence: 99%