Development of machine learning (ML) workflows is a tedious process of iterative experimentation: developers repeatedly make changes to workflows until the desired accuracy is attained. We describe our vision for a "human-in-the-loop" ML system that accelerates this process: by intelligently tracking changes and intermediate results over time, such a system can enable rapid iteration, quick responsive feedback, introspection and debugging, and background execution and automation. We finally describe Helix, our preliminary attempt at such a system that has already led to speedups of upto 10× on typical iterative workflows against competing systems.
Machine learning workflow development is a process of trial-anderror: developers iterate on workflows by testing out small modifications until the desired accuracy is achieved. Unfortunately, existing machine learning systems focus narrowly on model training-a small fraction of the overall development time-and neglect to address iterative development. We propose HELIX, a machine learning system that optimizes the execution across iterations-intelligently caching and reusing, or recomputing intermediates as appropriate. HELIX captures a wide variety of application needs within its Scala DSL, with succinct syntax defining unified processes for data preprocessing, model specification, and learning. We demonstrate that the reuse problem can be cast as a MAX-FLOW problem, while the caching problem is NP-HARD. We develop effective lightweight heuristics for the latter. Empirical evaluation shows that HELIX is not only able to handle a wide variety of use cases in one unified workflow but also much faster, providing run time reductions of up to 19× over state-of-the-art systems, such as DeepDive or KeystoneML, on four real-world applications in natural language processing, computer vision, social and natural sciences.PVLDB Reference Format:
Data application developers and data scientists spend an inordinate amount of time iterating on machine learning (ML) workflowsby modifying the data pre-processing, model training, and postprocessing steps-via trial-and-error to achieve the desired model performance. Existing work on accelerating machine learning focuses on speeding up one-shot execution of workflows, failing to address the incremental and dynamic nature of typical ML development. We propose HELIX, a declarative machine learning system that accelerates iterative development by optimizing workflow execution end-to-end and across iterations. HELIX minimizes the runtime per iteration via program analysis and intelligent reuse of previous results, which are selectively materialized-trading off the cost of materialization for potential future benefits-to speed up future iterations. Additionally, HELIX offers a graphical interface to visualize workflow DAGs and compare versions to facilitate iterative development. Through two ML applications, in classification and in structured prediction, attendees will experience the succinctness of HELIX's programming interface and the speed and ease of iterative development using HELIX. In our evaluations, HELIX achieved up to an order of magnitude reduction in cumulative run time compared to state-of-the-art machine learning tools. PVLDB Reference Format:
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.