Random projection experiments with chemometric data

Вармуза, Курт; Filzmoser, Peter; Liebmann, B.

doi:10.1002/cem.1295

Cited by 18 publications

(12 citation statements)

References 38 publications

(38 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The amount of information hidden in any given spectrum is large. Therefore it is sometimes an advantage, and sometimes simply necessary, to reduce the dimensionality of the data before applying multivariate statistical tools (Varmuza et al, 2010).…”

Section: Data Analysis and Classificationmentioning

confidence: 99%

COSIMA data analysis using multivariate techniques

Silén

Cottin

Hilchenbach

et al. 2015

Geosci. Instrum. Method. Data Syst.

View full text Add to dashboard Cite

Abstract.We describe how to use multivariate analysis of complex TOF-SIMS (time-of-flight secondary ion mass spectrometry) spectra by introducing the method of random projections. The technique allows us to do full clustering and classification of the measured mass spectra. In this paper we use the tool for classification purposes. The presentation describes calibration experiments of 19 minerals on Ag and Au substrates using positive mode ion spectra. The discrimination between individual minerals gives a cross-validation Cohen κ for classification of typically about 80 %. We intend to use the method as a fast tool to deduce a qualitative similarity of measurements.

show abstract

Section: Data Analysis and Classificationmentioning

confidence: 99%

COSIMA data analysis using multivariate techniques

Silén

Cottin

Hilchenbach

et al. 2015

Geosci. Instrum. Method. Data Syst.

View full text Add to dashboard Cite

show abstract

“…Since probability that the determinant for this matrix will be 0 is infinitely small, the columns of such matrix are linearly independent. Varmuza et al gave a very good overview of how random projections can be successfully used for solving some common chemometric problems, including clustering and classification. It must be noted though that the random projections are not very efficient for exploratory analysis as in contrast to, eg, conventional PCA and similar methods, they are not good at capturing directions with largest variance, which is quite crucial for analysis of hyperspectral images where interactive exploration is important.…”

Section: Introductionmentioning

confidence: 99%

Blessing of randomness against the curse of dimensionality

Kucheryavskiy

2017

Journal of Chemometrics

View full text Add to dashboard Cite

Modern hyperspectral images, especially acquired in remote sensing and from on-field measurements, can easily contain from hundreds of thousands to several millions of pixels. This often leads to a quite long computational time when, eg, the images are decomposed by Principal Component Analysis (PCA) or similar algorithms. In this paper, we are going to show how randomization can tackle this problem. The main idea is described in detail by Halko et al in 2011 and can be used for speeding up most of the low-rank matrix decomposition methods. The paper explains this approach using visual interpretation of its main steps and shows how the use of randomness influences the speed and accuracy of PCA decomposition of hyperspectral images. | INTRODUCTIONMost of the chemometric methods, which are widely used for exploratory analysis, clustering, and classification of hyperspectral images, are optimized for the cases when the number of observations is much smaller than the number of variables. Using these methods for analysis of large hyperspectral images (with the number of pixels counting from hundreds of thousands to several millions) often leads to lack of memory and long computational time making interactive exploration (eg, brushing) almost impossible.One of the ways to tackle this problem is to use randomness. Probability theory and mathematical statistics teach us that even random numbers follow certain well-defined rules. Using, for example, random sampling for solving deterministic problems is a very widespread approach with Monte Carlo method and its various modifications being probably the most popular. Algorithms based on Monte Carlo method are often used for analysis of Big Data, including such problems as low-rank approximation of data matrices, eg, singular values decomposition (SVD). 1Another way to use randomness for reducing dimension of original data is random projection, 2 where data are being projected to a set of randomly taken vectors, so projection basis in this case is represented by a matrix with random numbers. Since probability that the determinant for this matrix will be 0 is infinitely small, the columns of such matrix are linearly independent. Varmuza et al 3 gave a very good overview of how random projections can be successfully used for solving some common chemometric problems, including clustering and classification. It must be noted though that the random projections are not very efficient for exploratory analysis as in contrast to, eg, conventional PCA and similar methods, they are not good at capturing directions with largest variance, which is quite crucial for analysis of hyperspectral images where interactive exploration is important. This problem can be tackled by using several additional steps, which will allow to get a basis similar to PCA and, at the same time, will not increase the computational time dramatically. There are several ways allowing to achieve this; we will use an approach described by Halko et al.

show abstract

“…RP, devised under the conditions of orthonormality, projects the given high-dimensional data onto a lower dimensional subspace using a random matrix of unit length with normalized columns. Several applications of RP such as information retrieval (IR), handwritten text recognition, image compression, face recognition, indexing of audio documents are reported in the literature (Varmuza et al, 2010, and several references therein). DR by using RP, speeds-up the subsequent analysis like data classification and retrieval.…”

Section: Introductionmentioning

confidence: 99%

“…Extending upon Bingham and Manilla (2001) and Amador (2007) has provided a paradigm where sinusoidal kernels and RP are successful in providing compression and recovery of images. In a most recent investigation, Varmuza et al (2010) have proved that RP is a promising method for special applications in chemometrics with very large datasets and severe restrictions for hardware and software resources. In another very recent investigation, Wang and Plataniotis (2009) have proposed a method using RP on high-dimensional biometric data vectors and low-dimensional biometric feature vectors for face-based biometric verification problem.…”

Section: Introductionmentioning

confidence: 99%

Reducing data dimensionality using random projections and fuzzy k‐means clustering

Cherukuri

2011

International Journal of Intelligent Computing and Cybernetics

View full text Add to dashboard Cite

Purpose -The purpose of this paper is to introduce a new hybrid method for reducing dimensionality of high dimensional data. Design/methodology/approach -Literature on dimensionality reduction (DR) witnesses the research efforts that combine random projections (RP) and singular value decomposition (SVD) so as to derive the benefit of both of these methods. However, SVD is well known for its computational complexity. Clustering under the notion of concept decomposition is proved to be less computationally complex than SVD and useful for DR. The method proposed in this paper combines RP and fuzzy k-means clustering (FKM) for reducing dimensionality of the data. Findings -The proposed RP-FKM is computationally less complex than SVD, RP-SVD. On the image data, the proposed RP-FKM has produced less amount of distortion when compared with RP. The proposed RP-FKM provides better text retrieval results when compared with conventional RP and performs similar to RP-SVD. For the text retrieval task, superiority of SVD over other DR methods noted here is in good agreement with the analysis reported by Moravec. Originality/value -The hybrid method proposed in this paper, combining RP and FKM, is new. Experimental results indicate that the proposed method is useful for reducing dimensionality of high-dimensional data such as images, text, etc.

show abstract

Random projection experiments with chemometric data

Cited by 18 publications

References 38 publications

COSIMA data analysis using multivariate techniques

COSIMA data analysis using multivariate techniques

Blessing of randomness against the curse of dimensionality

Reducing data dimensionality using random projections and fuzzy k‐means clustering

Contact Info

Product

Resources

About