Random projection in dimensionality reduction

Bingham, Ella; Mannila, Heikki

doi:10.1145/502512.502546

Cited by 1,069 publications

(736 citation statements)

References 28 publications

Supporting

Mentioning

722

Contrasting

Unclassified

Order By: Relevance

“…In contrast, our algorithm does not suffer from the problems with online self-taught learning approaches [24] as the proposed model with the measurement matrix is data-independent. It has been shown that for image and text applications, favorable results can be achieved by methods with random projection than principal component analysis [25].…”

Section: Discussionmentioning

confidence: 99%

“…Given 0 < < 1 as well as β > 0, and let R ∈ R n×m be a random matrix projecting data from R m to R n , the theoretical bound for the dimension n that satisfies the Johnson-Lindenstrauss lemma is n ≥ 4+2β 2 /2− 3 /3 ln(d) [16]. In practice, Bingham and Mannila [25] pointed out that this bound is much higher than that suffices to achieve good results on image and text data. In their applications, the lower bound for n when = 0.2 is 1600 but n = 50 is sufficient to generate good results.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Real-Time Compressive Tracking

Zhang

Yang

2012

Lecture Notes in Computer Science

1,134

969

View full text Add to dashboard Cite

Abstract. It is a challenging task to develop effective and efficient appearance models for robust object tracking due to factors such as pose variation, illumination change, occlusion, and motion blur. Existing online tracking algorithms often update models with samples from observations in recent frames. While much success has been demonstrated, numerous issues remain to be addressed. First, while these adaptive appearance models are data-dependent, there does not exist sufficient amount of data for online algorithms to learn at the outset. Second, online tracking algorithms often encounter the drift problems. As a result of self-taught learning, these mis-aligned samples are likely to be added and degrade the appearance models. In this paper, we propose a simple yet effective and efficient tracking algorithm with an appearance model based on features extracted from the multi-scale image feature space with data-independent basis. Our appearance model employs nonadaptive random projections that preserve the structure of the image feature space of objects. A very sparse measurement matrix is adopted to efficiently extract the features for the appearance model. We compress samples of foreground targets and the background using the same sparse measurement matrix. The tracking task is formulated as a binary classification via a naive Bayes classifier with online update in the compressed domain. The proposed compressive tracking algorithm runs in real-time and performs favorably against state-of-the-art algorithms on challenging sequences in terms of efficiency, accuracy and robustness.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Real-Time Compressive Tracking

Zhang

Yang

2012

Lecture Notes in Computer Science

1,134

969

View full text Add to dashboard Cite

show abstract

“…In order to generate a single summary vector per shot, the vectors from each feature modality are concatenated, resulting in an even larger vector. Since the spill tree requires a low dimensional representation, the shot vectors are reduced to 100 dimensions by random projection [3]. Random projection is simple, fast, and is known to preserve neighborhood structure [8] making it a good choice for preprocessing nearest neighbor data.…”

Section: Nearest Neighbor For Scalabilitymentioning

confidence: 99%

YouTube Scale, Large Vocabulary Video Annotation

Morsillo

Mann

Pal

2010

Video Search and Mining

View full text Add to dashboard Cite

As video content on the web continues to expand, it is increasingly important to properly annotate videos for effective search and mining. While the idea of annotating static imagery with keywords is relatively well known, the idea of annotating videos with natural language keywords to enhance search is an important emerging problem with great potential to improve the quality of video search. However, leveraging web-scale video datasets for automated annotation also presents new challenges and requires methods specialized for scalability and efficiency. In this chapter we review specific, state of the art techniques for video analysis, feature extraction and classification suitable for extremely large scale automated video annotation. We also review key algorithms and data structures that make truly large scale video search possible. Drawing from these observations and insights, we present a complete method for automatically augmenting keyword annotations to videos using previous annotations for a large collection of videos. Our approach is designed explicitly to scale to YouTube sized datasets and we present some experiments and analysis for keyword augmentation quality using a corpus of over 1.2 million YouTube videos. We demonstrate how the automated annotation of webscale video collections is indeed feasible, and that an approach combining visual features with existing textual annotations yields better results than unimodal models.

show abstract

“…When the number of variables is large, the visualisation may combine a preprocessing stage by selection or linear transformation [3,4,5]. In a co-clustering method, both sides of the matrix are partionned [6], hence the reduction of the variables space and the row clustering occur simultaneously.…”

Section: Introductionmentioning

confidence: 99%

Generalized topographic block model

Priam

Nadif²,

Govaert

2016

Neurocomputing

View full text Add to dashboard Cite

Co-clustering leads to parsimony in data visualisation with a number of parameters dramatically reduced in comparison to the dimensions of the data sample. Herein, we propose a new generalized approach for nonlinear mapping by a re-parameterization of the latent block mixture model. The densities modeling the blocks are in an exponential family such that the Gaussian, Bernoulli and Poisson laws are particular cases. The inference of the parameters is derived from the block expectation-maximization algorithm with a Newton-Raphson procedure at the maximization step. Empirical experiments with textual data validate the interest of our generalized model.The authors have very carefully revised the document by following every comments from the editors and the two anonymous reviewers. It has been tried every possible effort to solve each remark that has been addressed. Below a detailed summary of the updates is provided. We would like to thank the editors and the reviewers for their constructive comments on this manuscript and positive support.To Editors:Once more, we would very much like to invite you to revise your paper, seriously taking into account the comments of the reviewers, and to resubmit your revised version by 02/25/2015 (mm/dd/yy). Any revision received after that may be treated as a new submission. Authors' response:The paper has been revised according to the comments and suggestions of reviewer 2.2) To Reviewer #1:The revised manuscript is sufficient to Neurocomputing publication standards, and I suggest accepting this manuscript. Authors' response:Thanks for the positive comments and the opportunity of publishing the document in Neurocomputing.3) To Reviewer #2: Q1. I thank the authors for the revised version of their manuscript. Authors' response:Thanks for the positive comments.Q2. They open the abstract with the statement: "Parametric methods for data visualisation are most of the time founded on an usual mixture model framework." Even a light-hearted revision of existing parametric methods for multivariate data visualization (See, for instance, Lee & Verleysen, 2007) would reveal that this is not the case. Therefore, I think this statement should be either removed or revised. Authors' response:Thanks for this suggestion. Indeed, the term « parametric methods » was meaning « probabilistic methods » or « parametric model » in a statistical framework and could have been read as any methods with parameters on the contrary to svd for instance. This sentence has been removed, and the summary updated for complying with other comments in the review.Q3. Co(Bi)-clustering in general and co(bi)-clustering with visualization-oriented self-organizing models are more adequately introduced in the new version. Authors' response:Thanks for this remark.Q4. I am a bit puzzled by the new introduction "storyline", though. It roughly goes like this:Revision Notes a) -Co-clustering was first proposed in the seventies and some more works [6][7][8][9][10][11] Authors' response:Thanks for this concern. Indeed, af...

show abstract

Random projection in dimensionality reduction

Cited by 1,069 publications

References 28 publications

Real-Time Compressive Tracking

Real-Time Compressive Tracking

YouTube Scale, Large Vocabulary Video Annotation

Generalized topographic block model

Contact Info

Product

Resources

About