A geometric algorithm for overcomplete linear ICA

Abstract. Random Projection (RP) has drawn great interest from the research of privacy-preserving data mining due to its high efficiency and security. It was proposed in [27] where the original data set composed of m attributes, is multiplied with a mixing matrix of dimensions k × m (m > k) which is random and orthogonal on expectation, and then the k series of perturbed data are released for mining purposes. To our knowledge little work has been done from the view of the attacker, to reconstruct the original data to get some sensitive information, given the data perturbed by RP and some priori knowledge, e.g. the mixing matrix, the means and variances of the original data. In the case that the attributes of the original data are mutually independent and sparse, the reconstruction can be treated as a problem of Underdetermined Independent Component Analysis (UICA), but UICA has some permutation and scaling ambiguities. In this paper we propose a reconstruction framework based on UICA and also some techniques to reduce the ambiguities. The cases that the attributes of the original data are correlated and not sparse are also common in data mining. We also propose a reconstruction method for the typical case of Multivariate Gaussian Distribution, based on the method of Maximum A Posterior (MAP). Our experiments show that our reconstructions can achieve high recovery rates, and outperform the reconstructions based on Principle Component Analysis (PCA).

show abstract

“…For Step 1) a lot of improvement work have been continuously done (e.g. [6], [24], [37] and [41]). For…”

Section: Reconstructions On Multiplicative Data Perturbationmentioning

confidence: 99%

Reconstructing Data Perturbed by Random Projections When the Mixing Matrix Is Known

Sang

Shen

Tian

2009

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

show abstract

“…The FIM is [18] (9) Then, the covariance of the estimated parameters are bounded as 1 (10) In our case, we want to compute the FIM from the observation ratio . Therefore, the elements of the FIM should be calculated as (11) To compute , we use (7) and (8). After some straightforward manipulations, 2 this term can be written as (12) where .…”

Section: System Model and Preliminariesmentioning

confidence: 99%

“…So, this method is called the EM-LMM method. A geometrical approach was proposed in [11] for estimating the mixing matrix. Recently, [12] proposed a potential-function-based clustering method constructed by a Laplacian-like window function.…”

mentioning

confidence: 99%

On the Cramér-Rao Bound for Estimating the Mixing Matrix in Noisy Sparse Component Analysis

Zayyani

Babaie‐Zadeh

Haddadi

et al. 2008

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

Abstract-In this letter, we address the theoretical limitations in estimating the mixing matrix in noisy sparse component analysis (SCA) for the two-sensor case. We obtain the Cramér-Rao lower bound (CRLB) error estimation of the mixing matrix. Using the Bernouli-Gaussian (BG) sparse distribution, and some simple assumptions, an approximation of the Fisher information matrix (FIM) is calculated. Moreover, this CRLB is compared to some of the main methods of mixing matrix estimation in the literature.Index Terms-Blind source separation, Cramér-Rao bound, mixing matrix estimation, sparse component analysis.

show abstract

“…Given a prior probability on the sources, it can be seen quickly [4], [10] that the most likely source sample is recovered by . Depending on the assumptions on the prior of , we get different optimization criteria.…”

Section: Bsrmentioning

confidence: 99%

“…In the experiments, we will assume a simple prior with any -norm . Then , which can be solved linearly in the Gaussian case and by linear programming or a shortest-path decomposition in the sparse, Laplacian case (see [5], [10]). …”

Section: Bsrmentioning

confidence: 99%

Median-based clustering for underdetermined blind signal processing

Theis

Puntonet

Lang

2006

IEEE Signal Process. Lett.

View full text Add to dashboard Cite

Abstract-In underdetermined blind source separation, more sources are to be extracted from less observed mixtures without knowing both sources and mixing matrix. -means-style clustering algorithms are commonly used to do this algorithmically given sufficiently sparse sources, but in any case other than deterministic sources, this lacks theoretical justification. After establishing that mean-based algorithms converge to wrong solutions in practice, we propose a median-based clustering scheme. Theoretical justification as well as algorithmic realizations (both online and batch) are given and illustrated by some examples.Index Terms-Blind source separation (BSS), independent component analysis (ICA). BLIND source separation (BSS), mainly based on the assumption of independent sources, is currently the topic of many researchers [1], [2]. Given an observed -dimensional mixture random vector , which allows an unknown decomposition , the goal is to identify the mixing matrix and the unknown -dimensional source random vector . Commonly, first is identified, and only then are the sources recovered. We will therefore denote the former task by blind mixing model recovery (BMMR) and the latter (with known ) by blind source recovery (BSR).In the difficult case of underdetermined or overcomplete BSS, where fewer mixtures than sources are observed , BSR is nontrivial (see Section II). However, our main focus lies on the usually more elaborate matrix recovery. Assuming statistically independent sources with existing variance and at most one Gaussian component, it is well known that is determined uniquely by [3]. However, how to do this algorithmically is far from obvious, and although quite a few algorithms have been proposed recently [4]-[6], performance is yet limited. The most commonly used overcomplete algorithms rely on sparse sources (after possible sparsification by preprocessing), which can be identified by clustering, usually by -means or some extension [5], [6]. However, apart from the fact that theoretical justifications have not been found, mean-based clustering only identifies the correct if the data density approaches a delta distribution. In Fig. 1, we illustrate the deficiency of mean-based clustering; we get an error of up to 5 per mixing angle, which is rather substantial considering the sparse density and the simple, complete case of . Moreover, the figure indi- cates that median-based clustering performs much better. Indeed, mean-based clustering does not possess any equivariance property (performance independent of ). In the following, we propose a novel median-based clustering method and prove its equivariance (Lemma 1.2) and convergence. For brevity, the proofs are given for the case of arbitrary , but , although they can be readily extended to higher sensor signal dimensions. Corresponding algorithms are proposed and experimentally validated.

show abstract

A geometric algorithm for overcomplete linear ICA

Cited by 93 publications

References 18 publications

Reconstructing Data Perturbed by Random Projections When the Mixing Matrix Is Known

Reconstructing Data Perturbed by Random Projections When the Mixing Matrix Is Known

On the Cramér-Rao Bound for Estimating the Mixing Matrix in Noisy Sparse Component Analysis

Median-based clustering for underdetermined blind signal processing

Contact Info

Product

Resources

About