We propose a novel privacy-preserving random kernel approximation based on a data matrix A ∈ R m×n whose rows are divided into privately owned blocks. Each block of rows belongs to a different entity that is unwilling to share its rows or make them public. We wish to obtain an accurate function approximation for a given y ∈ R m corresponding to each of the m rows of A. Our approximation of y is a real function on R n evaluated at each row of A and is based on the concept of a reduced kernel K(A, B ′ ) where B ′ is the transpose of a completely random matrix B. The proposed linear-programming-based approximation, which is public but does not reveal the privately-held data matrix A, has accuracy comparable to that of an ordinary kernel approximation based on a publicly disclosed data matrix A.Keywords: privacy-preserving approximation, random kernels, support vector machines, linear programming
INTRODUCTIONThe problem addressed in this work is that of obtaining an approximation to a given vector y ∈ R m of function values corresponding to the m rows of a data matrix A ∈ R m×n that represents m points in the n-dimensional real space R n . The matrix A is partitioned into q blocks of rows belonging to q entities that are unwilling to share their data or make them public. The motivation for this work arises from similar problems arising in classification theory where the data, corresponding to rows of a data matrix, is also held by various private entities and hence referred to as horizontally partitioned data. Thus in [19,15] privacy-preserving support vector machine (SVM) classifiers were obtained for such data, while in [20] induction tree classifiers were generated for similar problems. Other privacypreserving classifying techniques include cryptographically private SVMs [7], wavelet-based distortion [10] and rotation perturbation [3]. There is also a substantial body of research on privacy preservation in linear programming such as [1,12,13]. However, there does not appear to be any privacy-preserving applications to approximation problems in the literature. This is the problem we wish to address here as follows.In this work we propose an efficient privacy-preserving approximation (PPA) for horizontally partitioned data that is based on the following two ideas. For a given data matrix A ∈ R m×n , instead of using the usual kernel function K(A, A ′ ) : R m×n × R n×m −→ R m×m for constructing a linear or nonlinear approximation of a given y ∈ R m corresponding to the m rows of A, we use a random kernel [9, 8] K(A, B ′ ) : R m×n × R n×m −→ R m×m ,m < n, where B is a completely random matrix that is publicly disclosed. Such a random kernel will be shown to completely hide the data matrix A. The second idea is that each entity i ∈ {1, . . . , q} makes public only the kernel function K(A i , B ′ ) of its