2015
DOI: 10.14778/2824032.2824045
|View full text |Cite
|
Sign up to set email alerts
|

An architecture for compiling UDF-centric workflows

Abstract: Data analytics has recently grown to include increasingly sophisticated techniques, such as machine learning and advanced statistics. Users frequently express these complex analytics tasks as workflows of user-defined functions (UDFs) that specify each algorithmic step. However, given typical hardware configurations and dataset sizes, the core challenge of complex analytics is no longer sheer data volume but rather the computation itself, and the next generation of analytics frameworks must focus on optimizing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
56
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
6
2
2

Relationship

0
10

Authors

Journals

citations
Cited by 74 publications
(57 citation statements)
references
References 23 publications
1
56
0
Order By: Relevance
“…1536). This is in line with reports in [8], [9], [10] that the majority of clusters are rather small and have fewer than 50 nodes. Also, we experimented with raw datasets at the orders of hundreds gigabytes (the corresponding RDDs are 1-5 times larger), since, this is the typical dataset processed even in companies that are notorious for their big data application demands [11].…”
Section: Our Setting and The Benchmarking Applicationssupporting
confidence: 93%
“…1536). This is in line with reports in [8], [9], [10] that the majority of clusters are rather small and have fewer than 50 nodes. Also, we experimented with raw datasets at the orders of hundreds gigabytes (the corresponding RDDs are 1-5 times larger), since, this is the typical dataset processed even in companies that are notorious for their big data application demands [11].…”
Section: Our Setting and The Benchmarking Applicationssupporting
confidence: 93%
“…Unless otherwise stated, we report median values in seconds. In most cases, we do not exploit the full computational power of MareNostrum but use a more limited amount of cores, motivated by the fact that most Hadoop clusters to date are relatively small [17].…”
Section: Resultsmentioning
confidence: 99%
“…In-Database advanced analytics. Recent work at the intersection of databases and machine learning are extensively trying to facilitate efficient in-database analytics and have built frameworks and systems to realize such an integration [7,14,15,17,[55][56][57][58][59][60][61] (see [62] for a survey of various methods and systems). DAnA takes a step forward and exposes FPGA acceleration for in-Database analytics by providing a specialized component, Strider, that directly interfaces with the database to alleviate some of the shortcomings of the traditional Von-Neumann architecture in general purpose compute systems.…”
Section: Related Workmentioning
confidence: 99%