Botong Huang scite author profile

Botong Huang

5Publications

29Citation Statements Received

73Citation Statements Given

How they've been cited

115

How they cite others

120

Affiliations

Alibaba Group (China), Duke University, Microsoft Research (United Kingdom)

Publications

Order By: Most citations

Resource Elasticity for Large-Scale Machine Learning

Huang

Böehm

Tian

et al. 2015

View full text Add to dashboard Cite

Declarative large-scale machine learning (ML) aims at flexible specification of ML algorithms and automatic generation of hybrid runtime plans ranging from single node, in-memory computations to distributed computations on MapReduce (MR) or similar frameworks. State-of-the-art compilers in this context are very sensitive to memory constraints of the master process and MR cluster configuration. Different memory configurations can lead to significant performance differences. Interestingly, resource negotiation frameworks like YARN allow us to explicitly request preferred resources including memory. This capability enables automatic resource elasticity, which is not just important for performance but also removes the need for a static cluster configuration, which is always a compromise in multi-tenancy environments. In this paper, we introduce a simple and robust approach to automatic resource elasticity for large-scale ML. This includes (1) a resource optimizer to find near-optimal memory configurations for a given ML program, and (2) dynamic plan migration to adapt memory configurations during runtime. These techniques adapt resources according to data, program, and cluster characteristics. Our experiments demonstrate significant improvements up to 21x without unnecessary over-provisioning and low optimization overhead.

show abstract

Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental Computing

Wang

Zeng

Huang

et al. 2020

View full text Add to dashboard Cite

As the primary approach to deriving decision-support insights, automated recurring routine analytic jobs account for a major part of cluster resource usages in modern enterprise data warehouses. These recurring routine jobs usually have stringent schedule and deadline determined by external business logic, and thus cause dreadful resource skew and severe resource over-provision in the cluster. In this paper, we present Grosbeak, a novel data warehouse that supports resource-aware incremental computing to process recurring routine jobs, smooths the resource skew, and optimizes the resource usage. Unlike batch processing in traditional data warehouses, Grosbeak leverages the fact that data is continuously ingested. It breaks an analysis job into small batches that incrementally process the progressively available data, and schedules these small-batch jobs intelligently when the cluster has free resources. In this demonstration, we showcase Grosbeak using real-world analysis pipelines. Users can interact with the data warehouse by registering recurring queries and observing the incremental scheduling behavior and smoothed resource usage pattern.

show abstract

Cumulon

Huang

Babu

Yang

2013

View full text Add to dashboard Cite

We present Cumulon, a system designed to help users rapidly develop and intelligently deploy matrix-based big-data analysis programs in the cloud. Cumulon features a flexible execution model and new operators especially suited for such workloads. We show how to implement Cumulon on top of Hadoop/HDFS while avoiding limitations of MapReduce, and demonstrate Cumulon's performance advantages over existing Hadoop-based systems for statistical data analysis. To support intelligent deployment in the cloud according to time/budget constraints, Cumulon goes beyond databasestyle optimization to make choices automatically on not only physical operators and their parameters, but also hardware provisioning and configuration settings. We apply a suite of benchmarking, simulation, modeling, and search techniques to support effective cost-based optimization over this rich space of deployment plans.

show abstract

Baihe: SysML Framework for AI-driven Databases

Pfadler¹,

Zhu²,

Chen³

et al. 2021

Preprint

View full text Add to dashboard Cite

We present Baihe, a SysML Framework for AI-driven Databases. Using Baihe, an existing relational database system may be retrofitted to use learned components for query optimization or other common tasks, such as e.g. learned structure for indexing. To ensure the practicality and real world applicability of Baihe, its high level architecture is based on the following requirements: separation from the core system, minimal third party dependencies, Robustness, stability and fault tolerance, as well as stability and configurability.Based on the high level architecture, we then describe a concrete implementation of Baihe for PostgreSQL and present example use cases for learned query optimizers. To serve both practitioners, as well as researchers in the DB and AI4DB community Baihe for PostgreSQL will be released under open source license.

show abstract

Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version)

et al. 2023

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Botong Huang

Resource Elasticity for Large-Scale Machine Learning

Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental Computing

Cumulon

Baihe: SysML Framework for AI-driven Databases

Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version)

Contact Info

Product

Resources

About