Modern hyperspectral images, especially acquired in remote sensing and from on-field measurements, can easily contain from hundreds of thousands to several millions of pixels. This often leads to a quite long computational time when, eg, the images are decomposed by Principal Component Analysis (PCA) or similar algorithms. In this paper, we are going to show how randomization can tackle this problem. The main idea is described in detail by Halko et al in 2011 and can be used for speeding up most of the low-rank matrix decomposition methods. The paper explains this approach using visual interpretation of its main steps and shows how the use of randomness influences the speed and accuracy of PCA decomposition of hyperspectral images.
| INTRODUCTIONMost of the chemometric methods, which are widely used for exploratory analysis, clustering, and classification of hyperspectral images, are optimized for the cases when the number of observations is much smaller than the number of variables. Using these methods for analysis of large hyperspectral images (with the number of pixels counting from hundreds of thousands to several millions) often leads to lack of memory and long computational time making interactive exploration (eg, brushing) almost impossible.One of the ways to tackle this problem is to use randomness. Probability theory and mathematical statistics teach us that even random numbers follow certain well-defined rules. Using, for example, random sampling for solving deterministic problems is a very widespread approach with Monte Carlo method and its various modifications being probably the most popular. Algorithms based on Monte Carlo method are often used for analysis of Big Data, including such problems as low-rank approximation of data matrices, eg, singular values decomposition (SVD).
1Another way to use randomness for reducing dimension of original data is random projection, 2 where data are being projected to a set of randomly taken vectors, so projection basis in this case is represented by a matrix with random numbers. Since probability that the determinant for this matrix will be 0 is infinitely small, the columns of such matrix are linearly independent. Varmuza et al 3 gave a very good overview of how random projections can be successfully used for solving some common chemometric problems, including clustering and classification. It must be noted though that the random projections are not very efficient for exploratory analysis as in contrast to, eg, conventional PCA and similar methods, they are not good at capturing directions with largest variance, which is quite crucial for analysis of hyperspectral images where interactive exploration is important. This problem can be tackled by using several additional steps, which will allow to get a basis similar to PCA and, at the same time, will not increase the computational time dramatically. There are several ways allowing to achieve this; we will use an approach described by Halko et al.