In many signal processing and machine learning applications, datasets containing private information are held at different locations, requiring the development of distributed privacy-preserving algorithms. Tensor and matrix factorizations are key components of many processing pipelines. In the distributed setting, differentially private algorithms suffer because they introduce noise to guarantee privacy. This paper designs new and improved distributed and differentially private algorithms for two popular matrix and tensor factorization methods: principal component analysis (PCA) and orthogonal tensor decomposition (OTD). The new algorithms employ a correlated noise design scheme to alleviate the effects of noise and can achieve the same noise level as the centralized scenario. Experiments on synthetic and real data illustrate the regimes in which the correlated noise allows performance matching with the centralized setting, outperforming previous methods and demonstrating that meaningful utility is possible while guaranteeing differential privacy.
Building good feature representations and learning hidden source models typically requires large sample sizes. In many applications, however, the size of the sample at an individual data holder may not be sufficient. One such application is neuroimaging analyses for mental health disorders -there are many individual research groups, each with a moderate number of subjects. Pooling such data can enable efficient feature learning, but privacy concerns prevent sharing the underlying data. We propose a model for private feature learning in which the data holders share differentially private views of their respective datasets to enable collaborative learning of a joint feature map. We give an example of such an algorithm for independent component analysis (ICA) -a popular blind source separation algorithm used in neuroimaging analyses. Our algorithm is a differentially private version of the recently proposed distributed joint ICA algorithm. We evaluate the performance of this method on simulated functional magnetic resonance imaging (fMRI) data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.