Abstract-In underdetermined blind source separation, more sources are to be extracted from less observed mixtures without knowing both sources and mixing matrix. -means-style clustering algorithms are commonly used to do this algorithmically given sufficiently sparse sources, but in any case other than deterministic sources, this lacks theoretical justification. After establishing that mean-based algorithms converge to wrong solutions in practice, we propose a median-based clustering scheme. Theoretical justification as well as algorithmic realizations (both online and batch) are given and illustrated by some examples.Index Terms-Blind source separation (BSS), independent component analysis (ICA).
BLIND source separation (BSS), mainly based on the assumption of independent sources, is currently the topic of many researchers [1], [2]. Given an observed -dimensional mixture random vector , which allows an unknown decomposition , the goal is to identify the mixing matrix and the unknown -dimensional source random vector . Commonly, first is identified, and only then are the sources recovered. We will therefore denote the former task by blind mixing model recovery (BMMR) and the latter (with known ) by blind source recovery (BSR).In the difficult case of underdetermined or overcomplete BSS, where fewer mixtures than sources are observed , BSR is nontrivial (see Section II). However, our main focus lies on the usually more elaborate matrix recovery. Assuming statistically independent sources with existing variance and at most one Gaussian component, it is well known that is determined uniquely by [3]. However, how to do this algorithmically is far from obvious, and although quite a few algorithms have been proposed recently [4]-[6], performance is yet limited. The most commonly used overcomplete algorithms rely on sparse sources (after possible sparsification by preprocessing), which can be identified by clustering, usually by -means or some extension [5], [6]. However, apart from the fact that theoretical justifications have not been found, mean-based clustering only identifies the correct if the data density approaches a delta distribution. In Fig. 1, we illustrate the deficiency of mean-based clustering; we get an error of up to 5 per mixing angle, which is rather substantial considering the sparse density and the simple, complete case of . Moreover, the figure indi- cates that median-based clustering performs much better. Indeed, mean-based clustering does not possess any equivariance property (performance independent of ). In the following, we propose a novel median-based clustering method and prove its equivariance (Lemma 1.2) and convergence. For brevity, the proofs are given for the case of arbitrary , but , although they can be readily extended to higher sensor signal dimensions. Corresponding algorithms are proposed and experimentally validated.