A Cost-based Optimizer for Gradient Descent Optimization

Kaoudi, Zoi; Quiané-Ruiz, Jorge-Arnulfo; Thirumuruganathan, Saravanan; Chawla, Sanjay; Agrawal, Divy

doi:10.1145/3035918.3064042

Cited by 41 publications

(37 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, Postgres is not as good as Spark for general purpose batch processing where parallel full scans are the key performance factor. Several studies have shown this kind of performance differences [20,34,40,53,61]. Diversity as Common Ground.…”

Section: The Dark Side Of Big Datamentioning

confidence: 97%

“…Moreover, today's data analytics is moving beyond the limits of a single platform. For example: (i) IBM reported that North York hospital needs to process 50 diverse datasets, which run on a dozen different platforms [38]; (ii) Airlines need to analyze large datasets, which are produced by different departments, are of different data formats, and reside on multiple data sources, to produce global reports for decision makers [9]; (iii) Oil & Gas companies need to process large amounts of diverse data spanning various platforms [19,36]; (iv) Several data warehouse applications require data to be moved from a MapReduce-like system into a DBMS for further analysis [28,56]; (v) Business intelligence typically requires an analytic pipeline composed of different platforms [58]; and (vi) Using multiple platforms for machine learning improves performance significantly [20,40]. Status Quo.…”

Section: The Dark Side Of Big Datamentioning

confidence: 99%

“…In the former case, we aim at seamlessly integrating all data analytic activity governing an aircraft; In the latter case, we aim at reducing the effort scientists need for building data analytic pipelines while at the same time speeding up the running time. Note that several papers show different aspects of Rheem: the vision behind it [17]; its optimizer [43]; its inequality join algorithm [42]; and a couple of its applications [40,41]. A couple of demo papers showcase the benefits of Rheem [16] and its interface [47].…”

Section: The Dark Side Of Big Datamentioning

confidence: 99%

“…In contrast to existing systems [29,30,34,58,62], Rheem helps users in all above cases. The design of our system has been mainly driven by four applications: a data cleaning application, BigDansing [41]; a machine learning application, ML4all [40]; a database application, xDB; and an end-toend data discovery and preparation application, Data Civilizer [32]. We use these applications to showcase the benefits of performing cross-platform data processing, instead of single-platform data processing, in terms of both performance and ease of use.…”

Section: Cross-platform Processingmentioning

confidence: 99%

See 3 more Smart Citations

RHEEM: enabling cross-platform data processing

et al. 2018

Self Cite

View full text Add to dashboard Cite

Solving business problems increasingly requires going beyond the limits of a single data processing platform (platform for short), such as Hadoop or a DBMS. As a result, organizations typically perform tedious and costly tasks to juggle their code and data across different platforms. Addressing this pain and achieving automatic cross-platform data processing is quite challenging: finding the most efficient platform for a given task requires quite good expertise for all the available platforms. We present Rheem, a general-purpose cross-platform data processing system that decouples applications from the underlying platforms. It not only determines the best platform to run an incoming task, but also splits the task into subtasks and assigns each subtask to a specific platform to minimize the overall cost (e.g., runtime or monetary cost). It features (i) a robust interface to easily compose data analytic tasks; (ii) a novel cost-based optimizer able to find the most efficient platform in almost all cases; and (iii) an executor to efficiently orchestrate tasks over different platforms. As a result, it allows users to focus on the business logic of their applications rather than on the mechanics of how to compose and execute them. Using different real-world applications with Rheem, we demonstrate how cross-platform data processing can accelerate performance by more than one order of magnitude compared to single-platform data processing.

show abstract

Section: The Dark Side Of Big Datamentioning

confidence: 97%

Section: The Dark Side Of Big Datamentioning

confidence: 99%

Section: The Dark Side Of Big Datamentioning

confidence: 99%

Section: Cross-platform Processingmentioning

confidence: 99%

See 2 more Smart Citations

RHEEM: enabling cross-platform data processing

et al. 2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…The least-squares method is sensitive to noise and suitable for relatively small samples [30,31]. Gradient descent algorithms are often used as the core methods of training algorithms in the field of machine learning, and they is commonly used to recursively approximate a minimum deviation model, such as regression and artificial neural networks [32][33][34][35][36][37]. The batch gradient descent (BGD) algorithm is a conventional method of gradient descent that is widely used in the field of machine learning [35,[38][39][40].…”

Section: Introductionmentioning

confidence: 99%

Downscaling Precipitation in the Data-Scarce Inland River Basin of Northwest China Based on Earth System Data Products

Zuo

Chen

et al. 2019

Atmosphere

View full text Add to dashboard Cite

Precipitation is a key climatic variable that connects the processes of atmosphere and land surface, and it plays a leading role in the water cycle. However, the vast area of Northwest China, its complex geographical environment, and its scarce observation data make it difficult to deeply understand the temporal and spatial variation of precipitation. This paper establishes a statistical downscaling model to downscale the monthly precipitation in the inland river basin of Northwest China with the Tarim River Basin (TRB) as a typical representation. This method combines polynomial regression and machine learning, and it uses the batch gradient descent (BGD) algorithm to train the regression model. We downscale the monthly precipitation and obtain a dataset from January 2001 to December 2017 with a spatial resolution of 1 km × 1 km. The results show that the downscaling model presents a good performance in precipitation simulation with a high resolution, and it is more effective than ordinary polynomial regression. We also investigate the temporal and spatial variations of precipitation in the TRB based on the downscaling dataset. Analyses illustrate that the annual precipitation in the southern foothills of the Tianshan Mountains and the North Kunlun Mountains showed a significant upward trend during the study periods, while the annual precipitation in the central plains presented a significant downward trend.

show abstract