In this paper, we present a purpose-built data management system, MLdp, for all machine learning (ML) datasets. ML applications pose some unique requirements different from common conventional data processing applications, including but not limited to: data lineage and provenance tracking, rich data semantics and formats, integration with diverse ML frameworks and access patterns, trial-and-error driven data exploration and evolution, rapid experimentation, reproducibility of the model training, strict compliance and privacy regulations, etc. Current ML systems/services, often named MLaaS, to-date focus on the ML algorithms, and offer no integrated data management system. Instead, they require users to bring their own data and to manage their own data on either blob storage or on file systems. The burdens of data management tasks, such as versioning and access control, fall onto the users, and not all compliance features, such as terms of use, privacy measures, and auditing, are available. MLdp offers a minimalist and flexible data model for all varieties of data, strong version management to guarantee re-producibility of ML experiments, and integration with major ML frameworks. MLdp also maintains the data provenance to help users track lineage and dependencies among data versions and models in their ML pipelines. In addition to table-stake features, such as security, availability and scalability, MLdp's internal design choices are strongly influenced by the goal to support rapid ML experiment iterations, which
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.