Data valuation is an essential task in a data marketplace. It aims at fairly compensating data owners for their contribution. There is increasing recognition in the machine learning community that the Shapley value-a foundational profit-sharing scheme in cooperative game theory-has major potential to value data, because it uniquely satisfies basic properties for fair credit allocation and has been shown to be able to identify data sources that are useful or harmful to model performance. However, calculating the Shapley value requires accessing original data sources. It still remains an open question how to design a realworld data marketplace that takes advantage of the Shapley value-based data pricing while protecting privacy and allowing fair payments.In this paper, we propose the first prototype of a data marketplace that values data sources based on the Shapley value in a privacy-preserving manner and at the same time ensures fair payments. Our approach is enabled by a suite of innovations on both algorithm and system design. We firstly propose a Shapley value calculation algorithm that can be efficiently implemented via multiparty computation (MPC) circuits. The key idea is to learn a performance predictor that can directly predict model performance corresponding to an input dataset without performing actual training. We further optimize the MPC circuit design based on the structure of the performance predictor. We further incorporate fair payment into the MPC circuit to guarantee that the data that the buyer pays for is exactly the same as the one that has been valuated. Our experimental results show that the proposed new data valuation algorithm is as effective as the original expensive one. Furthermore, the customized MPC protocol is efficient and scalable.
No abstract
Encrypted database is an innovative technology proposed to solve the data confidentiality issue in cloud-based DB systems. It allows a data owner to encrypt its database before uploading it to the service provider; and it allows the service provider to execute SQL queries over the encrypted data. Most of existing encrypted databases (e.g., CryptDB in SOSP '11) do not support data interoperability: unable to process complex queries that require piping the output of one operation to another. To the best of our knowledge, SDB (SIGMOD '14) is the only encrypted database that achieves data interoperability. Unfortunately, we found SDB is not secure! In this paper, we revisit the security of SDB and propose a ciphertext-only attack named co-prime attack. It successfully attacks the common operations supported by SDB, including addition, comparison, sum, equi-join and group-by. We evaluate our attack in three real-world benchmarks. For columns that support addition and comparison , we recover 84.9% -- 99.9% plaintexts. For columns that support sum, equi-join and group-by , we recover 100% plaintexts. Besides, we provide potential countermeasures that can prevent the attacks against sum, equi-join, group-by and addition. It is still an open problem to prevent the attack against comparison.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.